Robots.txt – General information
Robots.txt is a text file located in the site’s root directory that specifies for search engines’ crawlers and spiders what website pages and files you want or don’t want them to visit. Usually, site owners strive to be noticed by search engines, but there are cases when it’s not needed: For instance, if you store sensitive data or you want to save bandwidth by not indexing excluding heavy pages with images.
When a crawler accesses a site, it requests a file named ‘/robots.txt’ in the first place. If such a file is found, the crawler checks it for the website indexation instructions.
NOTE: There can be only one robots.txt file for the website. A robots.txt file for an addon domain needs to be placed to the corresponding document root.
Robots.txt and SEO
Removing exclusion of images
The default robots.txt file in some CMS versions is set up to exclude your images folder. This issue doesn’t occur in the newest CMS versions, but the older versions need to be checked.
This exclusion means your images will not be indexed and included in Google Image Search, which is something you would want, as it increases your SEO rankings.
Should you want to change this, open your robots.txt file and remove the line that says:
Adding reference to your sitemap.xml file
If you have a sitemap.xml file (and you should have it as it increases your SEO rankings), it will be good to include the following line in your robots.txt file:
(This line needs to be updated with your domain name and sitemap file).
- You can also use the file to prevent specific pages from being indexed, like login- or 404-pages, but this is better done using the robots meta tag.
- Adding disallow statements to a robots.txt file does not remove content. It simply blocks access to spiders. If there is content that you want to remove, it’s better to use a meta noindex.
- As a rule, the robots.txt file should never be used to handle duplicate content. There are better ways like a Rel=canonical tag which is a part of the HTML head of a webpage.
- Always keep in mind that robots.txt is not subtle. There are often other tools at your disposal that can do a better job like the parameter handling tools within Google and Bing Webmaster Tools, the x-robots-tag and the meta robots tag.