Problems with the default Drupal robotstxt file

There are several problems with the default Drupal robots.txt file. If you use Google Webmaster Tool's robots.txt testing utility (detailed instructions on this utility later in this chapter) to test each line of the file, you'll find that a lot of paths which look like they're being blocked will actually be crawled. The reason is that Drupal does not require the trailing slash (/) after the path to show you the content. Because of the way robots.txt files are parsed, Googlebot will avoid the page with the slash but crawl the page without the slash.

Google what? Googlebot! Google and other search engines use server systems (sometimes called spiders, crawlers, or robots) to go around the Internet and find each web site. We sometimes refer to Google's system as the Googlebot to distinguish it from other search engine robots. While Google doesn't report this number anymore, it is estimated that the Googlebot crawls 10 billion web sites each week! That is a fast little robot.

For example, /admin/ is listed as disallowed. As you would expect, the testing utility shows that http://www.yourDrupalsite.com/admin/ is disallowed. But, put in http://www.yourDrupalsite.com/admin (without the trailing slash) and you'll see that it is allowed. Disaster! Fortunately, this is relatively easy to fix.

Fixing the Drupal robots.txt file

Carry out the following steps in order to fix the Drupal robots.txt file:

1. Make a backup of the robots.txt file.

2. Open the robots.txt file for editing. If necessary, download the file and open it in a local text editor.

3. Find the Paths (clean URLs) section and the Paths (no clean URLs) section. Note that both sections appear whether you've turned on clean URLs or not. Drupal covers you either way. They look like this:

# Paths

(clean URLs)

Disallow

/admin/

Disallow

/comment/reply/

Disallow

/contact/

Disallow

/logout/

Disallow

/node/add/

Disallow

/search/

Disallow

/user/register/

Disallow: /user/password/

Disallow: /user/login/

# Paths (no clean URLs) Disallow: /?q=admin/ Disallow: /?q=comment/reply/ Disallow: /?q=contact/ Disallow: /?q=logout/ Disallow: /?q=node/add/ Disallow: /?q=search/ Disallow: /?q=user/password/ Disallow: /?q=user/register/ Disallow: /?q=user/login/

4. Duplicate the two sections (simply copy and paste them) so that you have four sections—two of the # Paths (clean URLs) sections and two of # Paths (no clean URLs) sections.

5. Add 'fixed!' to the comment of the new sections so that you can tell them apart.

6. Delete the trailing / after each Disallow line in the fixed! sections. You should end up with four sections that look like this:

# Paths (clean URLs) Disallow: /admin/

/comment/reply/ /contact/ /logout/ /node/add/ /search/ /user/register/ /user/password/ /user/login/

/?q=comment/reply/ /?q=contact/ /?q=logout/ /?q=node/add/ /?q=search/ /?q=user/password/ /?q=user/register/

Disallow Disallow Disallow Disallow Disallow Disallow Disallow Disallow

Disallow Disallow Disallow Disallow Disallow Disallow Disallow

Disallow: /?q=user/login/

# Paths (clean URLs) - fixed! Disallow: /admin Disallow: /comment/reply Disallow: /contact Disallow: /logout Disallow: /node/add Disallow: /search Disallow: /user/register Disallow: /user/password Disallow: /user/login

# Paths (no clean URLs) - fixed! Disallow: /?q=admin Disallow: /?q=comment/reply Disallow: /?q=contact Disallow: /?q=logout Disallow: /?q=node/add Disallow: /?q=search Disallow: /?q=user/password Disallow: /?q=user/register Disallow: /?q=user/login

7. Save your robots.txt file, uploading it if necessary, replacing the existing file (you backed it up, didn't you?).

8. Go to http://www.yourDrupalsite.com/robots.txt and double-check that your changes are in effect. You may need to do a refresh on your browser to see the changes.

Now your robots.txt file is working as you would expect it to.

Search Engine Optimization

Search Engine Optimization

Discover The Secrets to Improve Your Site Ranking! Have you been wondering how you can draw more traffic to your website? Do you want to boost sales or do a better job of promoting your website online but have no idea how to go about it?

Get My Free Ebook


Post a comment