Following Google’s best practices can mean checking a lot of technical items for your website, including having and “optimizing” your robots.txt file. Whether you’re looking to understand what a robots.txt file is or how you should utilize your current file, follow our guide to get all your robots.txt questions answered.
A robots.txt file, also known as a robots exclusion standard, is a text file used to help control/instruct how robots (search engine robots) crawl your website. It can also be used to tell search engine robots which pages not to crawl on your website.
Now that you know what a robots.txt file does, how do you use it to tell the search engine robots to crawl or not crawl certain pages? These instructions are written out by using “disallowing” or “allowing.” Let’s first take a look at the basic setup:
User-agent: [user-agent name] Disallow: [URL string not to be crawled]
The “User-agent” applies to all robots while the “disallow” is telling the robot to ignore a specific page or all pages on the website. Here is an example:
There are a lot of reasons for you to want to know about having a robots.txt file on your website. Here are a few key reasons:
Depending on your website, you may not need a robots.txt file. However, there are a lot of factors that go into whether you might want one or not so it’s good to make sure you understand thoroughly the reasons for having one or not:
If you proceed with the decision to not have a robots.txt file, it’s good to understand that by choosing not to include that, you are allowing the search engine robots to have full access to your website.
Unsure about whether you have a robots.txt file or not? To check if you’ve already got one, simply type in your root domain (www.example.com) and then add “/robots.txt” to the end of the URL. For instance, Wpromote’s robots file is located at www.wpromote.com/robots.txt. If no “.txt” page appears, your website does not have a current or “live” robots.txt page.
Creating and making a robots.txt file isn’t as hard as it sounds. If you are able to type or copy and paste at all, then you have the ability to make a robots.txt file. The file, just as the name implies, is just a text file, meaning you can use something as simple as Notepad or a plain text editor to make one.
Google’s helpful article linked here walks you through the process on a more detailed basis, and once you’re done, you can always use this tool to test whether your file has been set up correctly or not. If you need a reference for what some well put together robots.txt files are check out these examples:
There are a lot of things you can put into your robots.txt file if you need to. However, the most basic of things that need to be added into your robots.txt file are:
Here are some common examples:
User-agent: * Disallow:
User-agent: * Disallow: /
User-agent: * Disallow: /folder/
User-agent: * Disallow: /file.html
Determining the specifics of each of these is something only you know.
If you remember from earlier how you can check to see if you have a robots.txt file or not, that’s exactly where it should go if you create one. The robots.txt file should always be coming off the root of your domain just like this: www.wpromote.com/robots.txt.
No matter what your domain responds with, whether that’s “www” or a “non-www” or with “https” or just “http,” you need to make sure it also has the same robots. Here are examples of what I mean below: www.example.com/robots.txt examples.com/robots.txt http://example.com/robots.txt https://example.com/robots.txt
It’s also important to know that capitalization matters. You should always use “robots.txt” and never “robots.TXT.”
If you’re still contemplating whether or not you may want a robots.txt file, let’s take a look at some of the pros and cons of having one.
To test whether your newly created or existing robots.txt file is setup correctly, you can utilize the robots.txt tester in Google Search Console. Simply click into your website, click on “Crawl” and select “robots.txt tester.” You are then able to submit any URL to see if it’s crawlable or being blocked.
The testing tool looks something like this: