SHARE

Robots.txt Guide

Following Google’s best practices can mean checking a lot of technical items for your website, including having and “optimizing” your robots.txt file. Whether you’re looking to understand what a robots.txt file is or how you should utilize your current file, follow our guide to get all your robots.txt questions answered.

Sections:

  1. What Is Robots.txt?
  2. Why Learn About Robots.txt?
  3. Do I Need Robots.txt On My Site?
  4. How To Check If You Have A Robots.txt File
  5. How Do I Make A Robots.txt File?
  6. What Should Be In My Robots.txt File?
  7. Where Should It Go On My Website?
  8. What Are The Pros And Cons?
  9. Testing Your Robots.txt File

What Is Robots.txt?

A robots.txt file, also known as a robots exclusion standard, is a text file used to help control/instruct how robots (search engine robots) crawl your website. It can also be used to tell search engine robots which pages not to crawl on your website.

Now that you know what a robots.txt file does, how do you use it to tell the search engine robots to crawl or not crawl certain pages? These instructions are written out by using “disallowing” or “allowing.” Let’s first take a look at the basic setup:

User-agent: [user-agent name] Disallow: [URL string not to be crawled]

The “User-agent” applies to all robots while the “disallow” is telling the robot to ignore a specific page or all pages on the website. Here is an example:

Why Learn About Robots.txt?

There are a lot of reasons for you to want to know about having a robots.txt file on your website. Here are a few key reasons:

  1. Having an improperly set up robots.txt file can hurt your website and your rankings.
  2. Your robots.txt file is what controls how search engine robots crawl and read your website pages. Making sure they are reading it correctly is very important because that will affect your website's rankings.
  3. Having a robots.txt file is an SEO best practice and something Google themselves recommend having.

Do I Need A Robots.txt File On My Website?

Depending on your website, you may not need a robots.txt file. However, there are a lot of factors that go into whether you might want one or not so it’s good to make sure you understand thoroughly the reasons for having one or not:

Cases In Which You Want To Have A Robots.txt File:

  1. There are some pages of content you don’t want to be indexed
  2. You are utilizing paid link or paid advertisements that have special instructions for robots
  3. You are in the process of creating or developing a site that is live, but don’t want search engine robots to crawl it yet

Cases In Which You May Not Want To Have A Robots.txt File:

  1. Your website is simple and doesn’t have any errors
  2. You don’t have any pages or files that you want or need to have blocked by search engines

If you proceed with the decision to not have a robots.txt file, it’s good to understand that by choosing not to include that, you are allowing the search engine robots to have full access to your website.

How To Check If You Have A Robots.txt File

Unsure about whether you have a robots.txt file or not? To check if you’ve already got one, simply type in your root domain (www.example.com) and then add “/robots.txt” to the end of the URL. For instance, Wpromote’s robots file is located at www.wpromote.com/robots.txt. If no “.txt” page appears, your website does not have a current or “live” robots.txt page.

How Do I Make A Robots.txt File?

Creating and making a robots.txt file isn’t as hard as it sounds. If you are able to type or copy and paste at all, then you have the ability to make a robots.txt file. The file, just as the name implies, is just a text file, meaning you can use something as simple as Notepad or a plain text editor to make one.

Google’s helpful article linked here walks you through the process on a more detailed basis, and once you’re done, you can always use this tool to test whether your file has been set up correctly or not. If you need a reference for what some well put together robots.txt files are check out these examples:

What Should Be In My Robots.txt File?

There are a lot of things you can put into your robots.txt file if you need to. However, the most basic of things that need to be added into your robots.txt file are:

  1. Full allow: this means you are letting search engine robots crawl all your pages
  2. Full disallow: this means you are not letting search engine robots crawl any of your pages
  3. Conditional allow: this means you have various directives within your robots file that determine whether search engine robots can crawl pages

Here are some common examples:

Allow Full Access

User-agent: *
Disallow:

Block All Access

User-agent: *
Disallow: /

Block One Folder

User-agent: *
Disallow: /folder/

Block One File

User-agent: *
Disallow: /file.html

Determining the specifics of each of these is something only you know.

Where Should My Robots.txt File Go?

If you remember from earlier how you can check to see if you have a robots.txt file or not, that’s exactly where it should go if you create one. The robots.txt file should always be coming off the root of your domain just like this: www.wpromote.com/robots.txt.

No matter what your domain responds with, whether that’s “www” or a “non-www” or with “https” or just “http,” you need to make sure it also has the same robots. Here are examples of what I mean below: www.example.com/robots.txt examples.com/robots.txt http://example.com/robots.txt https://example.com/robots.txt

It’s also important to know that capitalization matters. You should always use “robots.txt” and never “robots.TXT.”

What Are The Pros And Cons Of A Robots.txt File?

If you’re still contemplating whether or not you may want a robots.txt file, let’s take a look at some of the pros and cons of having one.

Pros Of Having A Robots.txt File:

  1. The ability to block sections of your website from being crawled. This can equate to you having a “crawl budget,” a term commonly used to describe the act of giving the search engine robots an “allowance” for how many pages they can crawl. It’s beneficial especially in circumstances when you want to quickly block sections of your site from being crawled.
  2. If you’re going through a content refresh on your website, you can use the robots.txt file to block “low-quality” pages. That way, these don’t harm your overall site performance.

Cons Of Having A Robots.txt File:

  1. While you might have told the search engine robots you don’t want them to crawl a certain page, that doesn’t mean it still won’t populate in the search results page. If that page is linked to enough times, the search engine will include it within the results, it just won’t know what’s on that page. To make sure that a page is truly blocked from showing up in the search results, you’ll need to use a meta robots noindex tag. However, if this is the route you want to take, you’ll need to allow the search engine to access the page to find the noindex tag—in which case you would not want to block it within the robots.txt file.
  2. By completely disallowing a page to be crawled by the search engine robots, this also means removing any link value. If you want to be able to spread link value across other links found on that page, you’ll again need to add a meta robots noindex tag instead of blocking it within the robots.txt file.

How To Test Your Robots.txt File

To test whether your newly created or existing robots.txt file is setup correctly, you can utilize the robots.txt tester in Google Search Console. Simply click into your website, click on “Crawl” and select “robots.txt tester.” You are then able to submit any URL to see if it’s crawlable or being blocked.

The testing tool looks something like this:

written by: Mary Beczak

Check Out Other Relevant Guides

Get Educated! Recieve Wpro U Updates, Case Studies & More

Thanks for signing up to be a Wpromote Insider.
You’ll be the first to get the scoop on our latest services, promotions and industry news.


CONNECT