For SEOs, a firm understanding of search engines’ crawling and indexing behavior is part of the job. We understand that the more optimized a website is for SEO, the more frequently it will be crawled. And the more often a website is crawled, the more likely it is to perform well on the SERPs.
It all boils down to instructing web crawlers on how to process and prioritize site content. And sometimes judging which, when, and how to implement the appropriate directive can be tricky!
From Disallow Tags to Rel-Canonicals, check out this list of 5 common crawl and index terms. We’ll tell you what they do and give pro tips on when to implement them!
1. THE DISALLOW TAG
A disallow tag is placed into a website’s Robots.txt file to instruct web crawlers to ignore a particular page.
user-agent: * disallow: /wp-admin/
Pro Tip: Use the disallow tag when there are sections of a site you want completely screened off from web crawlers. It can be formatted to block all web crawlers, specific crawlers, specific pages or folders, and/or specific URL parameters. All respectable search engines, such as Google and Bing, will respect this tag.
That said, there are some web crawlers that can choose to ignore the disallow instruction. Therefore, if there are sections of a site you would like truly off-limits to crawlers, a login page is the best direction to go.
2. THE NOINDEX TAG
A noindex tag is a piece of HTML code that tells search engines that while they have access to crawl a certain page, it should not be indexed/included in SERPs.
<!DOCTYPE html> <html> <head> <meta name="robots" content="noindex" /> (_) </head> <body>(_)</body> </html >
Pro Tip: Implement on a website’s exclusive pages, such as: “thank you,” “appointment confirmation,” or “member only” pages.
3. NOFOLLOW TAG
A nofollow tag is a piece of HTML code that tells search engines not to pass any link equity on a page’s backlink(s).
< a href="http://www.example.com/" rel=""nofollow"> Link text</a>
Pro Tip: Use for blog comments, paid links, user-generated content, embeds, or any other time you do not wish to actively endorse an external site.
4. THE REL-CANONICAL TAG
A rel-canonical tag is a piece of HTML code that tells search engines which version of a URL is preferred. It is a means of avoiding duplicate content from appearing in SERP results, and consolidating link equity between the same (or very similar) pages.
< link rel="canonical" href="http://example.com/blog" />
Pro Tip: Implement rel=”next” and rel=”previous” tags on areas of a site that span across multiple pages, such as: category pages, long articles, and or on-site search results.
5. XML Sitemap
The XML Sitemap is a file that lists out all of the URLs within a website. It includes information such as a site’s URL hierarchy and page-update dates. Its purpose is to help web crawlers do their job more efficiently and effectively.
< url> <loc> http"//www.wpromote.com/ </loc> <changefreq> weekly </changefreq> <priority> 1.00 </priority> </url>
Pro Tip: Though XML Sitemaps are recommended for all websites, they are especially helpful for those that are: especially large, new with few external backlinks, or use rich media content.
Bonus: URL Parameters
URL parameters are variables sometimes seen at the end of “clean” URLs, denoted by the presence of a “?” symbol. Their purpose is to help track information such as session IDs, languages, and product filter options.
http:/example.com/products/women/dresses?sessionid=12345 http:/example.com/products/women/dresses?sessionid=34567 http:/example.com/productive/women/dresses?sessionid=34567&source=google.com
Pro Tip: Sometimes search engines read these parameters as entirely separate URLs, resulting in issues with duplicate content. For this reason, it is considered best practice to avoid URL parameters when possible. That said, adding a rel-canonical tag to the clean version of a URL is a perfectly viable solution for websites already using them.
Did you find this list helpful? Check out Wpromote’s complete SEM Glossary here!