Clean URLs

It is important to discuss this particular topic early on because it acts as a cog in the greater machinery of not only your site, but also in how your site interacts with the rest of the Internet. The simplicity of the Clean URLs con figuration page belies its importance:

Clean URLs

CkjihUULi:

O Rubied ® Enables

As you can see, the choice is simple—either enable or disable Clean URLs. Your system should also tell you whether or not it is possible to use clean URLs—if you see something like the following screenshot, then you have problems:

our fvst*™ corvfiçu'ïtion do*r not ou<i*ntl-r tupport rtir-j fi?tu<* "^h* -»"idboi'l p»u«< or- ■ ■ tr ■. i K#r kddibon^i ïreubltlheoCJn^ |nfo"n.pCjon.

Remember:

P .' It is highly recommended that you have Clean URLs enabled on your "1

The reason for this recommendation is because you naturally want your site to be able to compete fairly with other sites when it comes time for Google and other search engines to index its web pages. Search engines use automated programs to traverse the Web (called bots) and when they come across nice, straightforward URLs like the ones displayed by Drupal when Clean URLs are enabled, http: //localhost/drupal/node/2, they happily go about their business, indexing pages.

Indexing allows content to start showing up in Web searches and hence more people can find these pages and you're on your way (more or less). If however, they come across dynamic URLs (ones that contain query strings) then they often don't put the same effort into indexing that page, or worse, ignore it entirely. This can lead to a situation where you have a lot of lovely content just waiting to be read, but no one is able to find it because the search engines are ignoring all the pages of form:

3HING

The highlighted part of this URL, (?q=) is what causes the problem. Drupal navigates around its own pages by a system of internal URLs that it finds using queries in the format shown in the previous URL. In other words, ?q=node/2 is asking Drupal to go and find whatever content or page is held at node/2. The problem is that the Googlebot simply sees the dynamic query and says to itself, "Hmmm, this could be a nasty trick designed to make me index the same page millions of times over so I won't pay it any mind".

Actually, providing informative names (called aliasing) for posts is far better than relying on Drupal's default numbering system. It's worth skipping ahead and looking over the section on Path & Pathauto in Chapter 10 so that you get into the habit of providing user and search engine friendly aliases for all your content.

The people at Drupal realized this is the case, so if it is possible on your setup, clean URLs are enabled by default and you don't have to worry about any of this anyway. If you have installed Apache2Triad then your development machine is safe in this regard. The problem comes during deployment because it is quite possible that your Internet service provider's setup does not allow for clean URLs. Now what?

If you already know who is going to host your live site, then try test things out now by installing a copy of Drupal on the live server and ensuring it is possible to use clean URLs (see Appendix A on Deployment for more information). If you can't, consider finding another host that does. Otherwise, you will end up having to deal with their system admin guys and hang around until they can sort stuff out or eventually start ignoring you.

Whether you can or can't use clean URLs basically comes down to a configuration setting in Apache. On your development machine you have direct access to the httpd. conf le (found in the conf folder of your Apach2Triad installation) that Apache uses for its configuration—this is probably not the case on your live servers since any given host obviously doesn't want to give everyone using their servers total control to mangle everything as they see t.

In order for Drupal to implement clean URLs, Apache needs to have mod_rewrite enabled. By way of example, open up httpd. conf and search for the line that reads:

LoadModule rewrite_module modules/mod_rewrite.so

That's the line that determines whether or not Apache can implement what Drupal requires in order to give you clean URLs. If it's commented out you will need to uncomment it and then restart Apache before any changes take effect.

Site Configuration

If you find that at some stage you fall into the trap of having clean URLs enabled on a system that cannot implement them, causing all sorts of fun problems, then manually navigating to the following page should allow you to disable the clean URLs and use the site as normal:

http://localhost/drupal/?q=admin/settings/clean-urls

Remember to exchange the highlighted part for whatever is pertinent for your setup.

0 0

Post a comment