The Relationship Between Latent Dirichlet Allocation and Google Rankings

John Vantine | September 10th, 2010

SEOMoz is creating quite a buzz in the SEO community with their research and findings on Latent Dirichlet Allocation (or LDA).

…Huh? Admittedly, the term itself is intimidating. If you haven’t already stopped reading due to that monstrous terminology, LDA is an algorithm that determines relevance of keywords based on the context in which they are used in a site’s content, and the relationships between the words in the content.

So how does this research relate to SEO?

People have been using the phrase “content is king” for years. When writing content, SEOs will often include a handful of specific keywords that they’re targeting, in an effort to ultimately list higher in the SERPs for those keywords.

With these latest LDA findings, it has come to light that simply including these keywords in your title tags/headers/content is not enough… Google actually detects and identifies content of substance – High quality content that is on-topic and uses semantically related terminology which falls closely in line with the keyword(s) that you’re trying to list for.

So… How do I… Uhh?

Bear with me here. SEOMoz recently released an experimental tool (currently free for public use) that attempts to show us the relevancy between our keyword and the content on our site. Unlike the concept as a whole, the tool itself is pretty straightforward.

Enter your keyword into the tool. along with the content that you’re trying to list for that keyword. Or, if you’d like the tool to examine the content of an entire site, enter the URL. The tool will return a score (in percentage form) that represents the cosine similarity between the keyword and the content. The LDA Tool returns a set of phrases and words that are (likely) closely connected in a term vector model.

The research done by SEOMoz thus far shows a noteworthy correlation between LDA cosine scores and Google rankings. The correlation is .33 – To put this in perspective, a correlation of 1.0 would mean that a site’s LDA cosine score for a particular keyword aligns perfectly with their position in the SERPs for that same keyword.

What do I do with this information?

Try running the tool with your main keyword and the content of your biggest competitor(s), and compare the percentage to yours. Whose content is more relevant to the keyword at hand, according to this tool? Who lists higher in the SERPs for that keyword?

You can expand on this idea as well. Make a list of the sites that appear on page one of the SERPs for your keyword. Include some other ranking factors as well… Things like page title, number of linking domains (to the URL) and number of linking domains (to the top level domain). You could also include metrics like pagerank, domain pagerank, page authority (from SEOMoz), and more. Don’t forget to include the LDA cosine score for each!

Top 10 results in Google for meatballs.

An SEO analysis chart with all of these metrics, in addition to LDA cosine score, gives you a better idea of what you're up against.

You can also use this tool to A/B test copy for your pages. Perhaps write 2 different versions of the copy on your main page, and then run both versions of the copy through the tool. Which one scores higher?

Keep in mind that the tool is relatively new (released on August 31st), and not only is it still being developed, but people are still finding out how to use it and what to do with the insight that it provides.

How do we use this the new insight to our advantage?

A definite answer to that question has yet to be determined. Some have suggested using tools like Google’s Wonder Wheel, the “related searches” feature, or Google Sets to find related keywords to include and elaborate on. While these are all helpful tools (when used properly), it all comes back to the importance of creating context-sensitive content. The goal is not to simply fit as many keywords (or variations of a keyword) as possible on a page. It’s about the relevance of the topic(s) on the page, according to the search engine.

Avoid topical ambiguity by using synonyms and contextually relevant words. When writing content, try to ensure that most of your terminology reinforces the main keyword/focus of your article. To continue with the “meatballs” sample above… If you’re writing about the type that accompanies spaghetti, use words like “parmesan”, “italian”, “tomatoes”, “garlic”, etc to support your theme and make it abundantly clear what you’re writing about. If you’re writing about the late 70′s movie about summer camp, you’d want to use words like “Bill Murray”, “comedy”, “campfire”, “pranks”, “camp counselor”… You get the gist.

Remember, LDA is (suspected to be) one of over 200+ signals in Google’s ranking algorithm. Many other factors are at play here, and while generally speaking result #1 will tend to have a higher relevance score than result #5, this will not always be the case. This can be seen in the screenshot of the “meatballs” SERPs above. The IMDB page has an LDA score of 42%. So why is it #1? Look at the domain/page authority and the number of links.

Note that the tool may return different scores for the same query when run multiple times. Due to the nature of the query being performed, only a sample of the topics that the keyword/content could fit with are checked. Because of this, results can be a bit inconsistent – usually from 1-5%.

Google’s goal is to deliver the most relevant content possible, and the release of this tool has shed a bit of light on one of the algorithms that they use in attempting to reach that goal.

Does this information represent a new era of SEO, or some revolutionary new optimization techniques that will need to be employed? No. You still need quality keyword-rich content and relevant links to that content in order to list well. But perhaps this new research will make people think twice before outsourcing content writing for $5 an article. Quality content gets results.

Related posts:

  1. 4 Things You Can Do With Google Analytics
  2. Google Raises Prices for Top Paid Ad Placement
  3. My Notes From SMX West 2010 (Part One)
  4. An Intro Into SEO Keyword Analysis
  5. Land of the lost search rankings

Tags: , , , , , , ,

 
 
 

Awards & Recognition

LABJ Best Places To Work
LABJ Best Places To Work

Top Ten Best Places To Work in LA

 
Google AdWords Certified Partner
Google AdWords Partner

Highest Certification Score Nationwide

 
Better Business Bureau
Better Business Bureau

BBB Accredited Business Since 1999

 
Inc. 500 Top SEO's
Inc 500|5000

Five Time Inc 500|5000 Honoree

 
MSN Certified
MSN Certified

Original adExcellence Member

 
Deloitte Fast 500
Deloitte Fast 500

Honored as a 2010 Deloitte Fast 500 Winner