Robots Exclusion in short known as Robots.txt. Robots.txt is a text file that allows or blocks the search engines crawlers from crawling/indexing a set of specific pages of a particular website. Robots file is simply a text file, but a very powerful one that is normally used to block search engines from crawling or indexing particular directory folders like the /admin, /scripts or other similar stuff like that. If you don’t know about the usage or Robots file, then without getting proper knowledge, its better to stay away from it. A wrong command in this file will restrict the Search Engine bots to exclude your site from indexing.

The Robot Exclusion Standard was proposed by a Dutch software engineer ‘Martijn Koster’, the standard is explained below:

“Web site owners use the /robots.txt file to give instructions about their site to web robots; this is called The Robots Exclusion Protocol.”

robots-txt

Search engines usually uses Bots or web crawlers to crawl a website, these bots are specially designed to search for a robots.txt file and crawl according to the file. All the important info which you don’t want to display to the search engines remains hidden from the crawlers. The typical position of a robots files is in the root directory of your website server.

For Example : http://example.com/robots.txt

Recommended Reading :

Check out this quick Video Tutorial from Google, Officially (by Matt Cutts) on Robots File:-

Can I use robots.txt to optimize Googlebot’s crawl?

The text file contains the clear-cut information that informs the Bots to avoid/reject the unique individual file or directory. The important commands used in the robots.txt file are the “Allow” and “Disallow”, these commands directs the web crawlers to the relevant location and to the relevant files. Here’s a view of the text file and the commands used in the text file.

robots-txt-file

  • Now the “User-agent: * “signifies that the category applies to all the robots.
  • The “Disallow: /cgi-bin/” signifies that it should not visit the ‘cgi-bin’ directory.
  • The “Allow: /” signifies that the specific bots are allowed to crawl the entire website.

The text file is usually used by an administrator when the information of the file or the directory is not relevant or not appropriate. So if you are disallowing a complete URL in the robots.txt file then the URL will not appear in the search engines and in the search results until and up to you don’t allow the URL.

How to Use a Robots.txt File ?

Virtual Robots.txt file is used in WordPress which you will not find on your FTP server. Whenever you publish your first post, WordPress automatically renders a virtual robots.txt file but a think to keep in mind is that the text file generated does not contain any kind of restrictions. You yourself have to edit it and create your own restrictions. To edit the text file you just have to Install a Plugin named the “WP Robots Txt” or Kbrobots.

Must Read : Best WordPress based eCommerce Theme Frameworks

How to Install WP Robots Txt plugin in WordPress ?

Step 1: Just navigate to Plugins section and click on Add New link.

Step 2: Search for ‘WP Robots Txt’ plugin in the search box.

Step 3: After looking for the plugin, now click on the Install Now button to install the plugin.

Step 4: After installing the plugin click on the Activate Plugin for its activation.

Step 5: Now visit the plugin and navigate to Settings drop-down menu. Look for Robots.txt content file.

Step 6: In the content field you will find the default commands .

  • User–agent: *
  • Disallow: /wp-admin/
  • Disallow: /wp-includes/

Edit your text file using the appropriate and proper the guidelines of robotstxt.org.

Commands used in the robots.txt file is case sensitive, every extra space can also make your text file weird or can also be called as waste. So be very much careful in giving the commands. Use Upper-case & Lower-case words a bit carefully as these are also a jumble of mistakes.

So do use a robots.txt file for your own privacy and for blocking fake and spam web crawlers from reading your personal web info.

Related Useful Tips

  1. You can also test and edit your manually made Robots.txt file using Google Webmaster Tools. Just Login and upload your file and click on the Test button to test the file.
  2. Use PC Robots.txt Plugin to automatically create a virtual robots.txt file for your website. The text file can be edited and managed from the plugin settings page.
  3. Also know that robots.txt file is just a directive and search engines may or may not use and follow it, but most of the times they follow it before they starts crawling/indexing a webpage.
  4. The commands and structure of robots file depends from search engine to search engine.