Thursday, 21 January 2010

Google synonyms and latent semantic indexing

The latest Google blog trumpets the progress made by the search engine to understand synonyms (different words with the same meaning). Steven Baker, a software engineer at the company, does a very good job in explaining how the search engine produces SERPs which contain sites featuring terms related, but not necessarily identical, to a user's query.

To use an example featured in the blog, a search for 'song words' will bring up links to websites featuring the terms 'song lyrics' and 'music lyrics'.

Baker's blog gives us an interesting insight into Google's advanced use of latent semantic indexing; the method applied by the search engine in order to identify what SERPs a specific website will appear in.

What is latent semantic indexing?

When Google indexes a web page it examines the content on offer, crawling over words and phrases in order to determine the focus and subject of a site. Using this information, the search engine will pass a judgement on a domain and rank it in results pages for what is deemed to be the appropriate and relevant searches.

For example:

A writing website which repeatedly uses the phrase 'SEO copywriting' in its content will rank in SERPs for the search 'SEO copywriting'.


Google also looks to see how relevant a phrase is - searching for related content across a website - before ranking a domain for a particular term in specific results pages. This is known as latent semantic indexing.

Going back to the website in the previous example:

Google will recognise that a domain is related to 'SEO copywriting' if its content includes terms like 'search engine optimised content', or 'SEO-friendly copywriting'. In this instance, the site is judged to be relevant to the topic of 'SEO copywriting' and the website is likely to appear in SERPS for this query.

Why does Google use latent semantic indexing?

The process is designed to establish relevancy. Before the introduction of latent semantic indexing, websites could appear in SERPs for irrelevant terms if a single keyword was repeated enough times in its content.

Furthermore, SEO copywriters could abuse the system, stuffing a site with a single keyword in order to rank for a particular query.

Obviously, this wasn't ideal; Google couldn't reliably rank websites, while readers were often subjected to nonsensical, keyword-heavy copy which was designed to push a site onto page one for a term.

Lizz Sheppard explains the theory behind latent semantic indexing:

"The purpose of latent semantic indexing techniques is to create web content that can be indexed closer to the way a human would rank the page. If a human were reading the page and ranking it against the other web pages that use the same keywords, the repetition of the words would not be a factor in how well the page ranks.

"Instead, the actual information given on the page would be important. The use of related words signifies that the subject is being thoroughly covered."

What are the implications of latent semantic indexing for SEO copywriting?

Latent semantic indexing has seen the death of keyword stuffing. It's a waste of time to repeatedly crowbar specific words into copy; search engines are smart enough to recognise spam content and a site will be penalised for the practice.

Baker's Google blog serves as a reminder that the search engine is getting very good at recognising – and rewarding – relevant, natural content.

Instead of focusing on one specific term, content should embrace a variety of related words and phrases. On a basic level, a healthy spread of keywords makes it easier for Google to establish a site's relevancy to a particular subject and rank it accordingly. The more relevant a site is to a specific term, the higher it will appear in SERPs.

"Create a useful, information-rich site and write pages that clearly and accurately describe your content."

Google content guidelines

Equally though, alternative keywords also open up the possibility that a domain may drive in traffic from SERP terms which may not have been originally considered.

"Think about the words users would type to find your pages and make sure that your site actually includes those words within it."

Google content guidelines

There are a number of tools which can suggest alternative keywords. Google Insights For Search gives the opportunity to compare the popularity of different user searches, while also offering related terms. Google Adwords is also a reliable resource for phrases you may not have previously considered, while KwMap and SEMrush come highly recommended.

1 comment: