written by:

SEOMoz is creating quite a buzz in the SEO community with their research and findings on Latent Dirichlet Allocation (or LDA).

…Huh? Admittedly, the term itself is intimidating. If you haven’t already stopped reading due to that monstrous terminology, LDA is an algorithm that determines relevance of keywords based on the context in which they are used in a site’s content, and the relationships between the words in the content.

So how does this research relate to SEO?

People have been using the phrase “content is king” for years. When writing content, SEOs will often include a handful of specific keywords that they’re targeting, in an effort to ultimately list higher in the SERPs for those keywords.

With these latest LDA findings, it has come to light that simply including these keywords in your title tags/headers/content is not enough… Google actually detects and identifies content of substance – High quality content that is on-topic and uses semantically related terminology which falls closely in line with the keyword(s) that you’re trying to list for.

So… How do I… Uhh?

Bear with me here. SEOMoz recently released an experimental tool (currently free for public use) that attempts to show us the relevancy between our keyword and the content on our site. Unlike the concept as a whole, the tool itself is pretty straightforward.

Enter your keyword into the tool. along with the content that you’re trying to list for that keyword. Or, if you’d like the tool to examine the content of an entire site, enter the URL. The tool will return a score (in percentage form) that represents the cosine similarity between the keyword and the content. The LDA Tool returns a set of phrases and words that are (likely) closely connected in a term vector model.

The research done by SEOMoz thus far shows a noteworthy correlation between LDA cosine scores and Google rankings. The correlation is .33 – To put this in perspective, a correlation of 1.0 would mean that a site’s LDA cosine score for a particular keyword aligns perfectly with their position in the SERPs for that same keyword.

What do I do with this information?

Try running the tool with your main keyword and the content of your biggest competitor(s), and compare the percentage to yours. Whose content is more relevant to the keyword at hand, according to this tool? Who lists higher in the SERPs for that keyword?

You can expand on this idea as well. Make a list of the sites that appear on page one of the SERPs for your keyword. Include some other ranking factors as well… Things like page title, number of linking domains (to the URL) and number of linking domains (to the top level domain). You could also include metrics like pagerank, domain pagerank, page authority (from SEOMoz), and more. Don’t forget to include the LDA cosine score for each!

Top 10 results in Google for meatballs.

An SEO analysis chart with all of these metrics, in addition to LDA cosine score, gives you a better idea of what you're up against.

You can also use this tool to A/B test copy for your pages. Perhaps write 2 different versions of the copy on your main page, and then run both versions of the copy through the tool. Which one scores higher?

Keep in mind that the tool is relatively new (released on August 31st), and not only is it still being developed, but people are still finding out how to use it and what to do with the insight that it provides.

How do we use this the new insight to our advantage?

A definite answer to that question has yet to be determined. Some have suggested using tools like Google’s Wonder Wheel, the “related searches” feature, or Google Sets to find related keywords to include and elaborate on. While these are all helpful tools (when used properly), it all comes back to the importance of creating context-sensitive content. The goal is not to simply fit as many keywords (or variations of a keyword) as possible on a page. It’s about the relevance of the topic(s) on the page, according to the search engine.

Avoid topical ambiguity by using synonyms and contextually relevant words. When writing content, try to ensure that most of your terminology reinforces the main keyword/focus of your article. To continue with the “meatballs” sample above… If you’re writing about the type that accompanies spaghetti, use words like “parmesan”, “italian”, “tomatoes”, “garlic”, etc to support your theme and make it abundantly clear what you’re writing about. If you’re writing about the late 70’s movie about summer camp, you’d want to use words like “Bill Murray”, “comedy”, “campfire”, “pranks”, “camp counselor”… You get the gist.

Remember, LDA is (suspected to be) one of over 200+ signals in Google’s ranking algorithm. Many other factors are at play here, and while generally speaking result #1 will tend to have a higher relevance score than result #5, this will not always be the case. This can be seen in the screenshot of the “meatballs” SERPs above. The IMDB page has an LDA score of 42%. So why is it #1? Look at the domain/page authority and the number of links.

Note that the tool may return different scores for the same query when run multiple times. Due to the nature of the query being performed, only a sample of the topics that the keyword/content could fit with are checked. Because of this, results can be a bit inconsistent – usually from 1-5%.

Google’s goal is to deliver the most relevant content possible, and the release of this tool has shed a bit of light on one of the algorithms that they use in attempting to reach that goal.

Does this information represent a new era of SEO, or some revolutionary new optimization techniques that will need to be employed? No. You still need quality keyword-rich content and relevant links to that content in order to list well. But perhaps this new research will make people think twice before outsourcing content writing for $5 an article. Quality content gets results.


8 thoughts on “The Relationship Between Latent Dirichlet Allocation and Google Rankings
  1. Dom says:

    Stop, stop…just stop. You had me at Latent Dirichlet Allocation.

  2. Keith says:

    This is nothing new Google has used this for many years. It’s the same thing as Latent Symatic Indexing. It’s like if you say there was a “fork in the road” Google determines if you are talking about a road and cars or food at a table basically based on context of the words around it.

Leave a Reply

Your email address will not be published.

Internet News

Google Instant Search: What does this mean for advertisers?
Internet News

Video Profile: Braydon Holtzinger
Internet News

Viral Video Friday!
Thinking about writing for the Wpromote Blog?
Check out our Guest Blogging Guidelines!
Become An Insider! Never Miss Our Industry-Leading Content

Thanks for signing up to be a Wpromote Insider.
You’ll be the first to get the scoop on our latest services, promotions and industry news.