The Importance of Cron

The indexing process used by Drupal's Search module only works when the "cron" utility has been properly configured. cron is a utility used to run various commands at scheduled intervals on your web server. It is responsible for performing maintenance tasks on a Drupal site, such as clearing old log entries, as well as scheduling bulk email and other tasks that happen with regular frequency.

Each time cron runs, Drupal will catalog some of the site's content; by default, it indexes 200 posts each time. If your site has a large number of posts already, the speed of the indexing will depend on how frequently cron is configured to run on your server.

If you're not sure whether cron has been set up, or if you're running on your local computer to test the site out, you can tell Drupal to perform its cron tasks by visiting AdministersReportssStatus report (admin/logs/status) and clicking the "run cron manually" link. For more information on setting up cron for your site, see http://drupal .org/cron.

Home > Administer>Site configuration search settings

The search engine maintains an index of words found in your site's content. To build and maintain this index, a correctly configured cron maintenance task is required. Indexing behavior can be adjusted using the settings below.

—Indexing status-

100% of the site has been indexed. There are 0 items left to index.

Re-index site

-Indexing throttle-

Number of items to index per cron run:

The maximum number of items indexed in each pass of a cron maintenance task. If necessary, reduce the number of items to prevent timeouts and memory errors while indexing.

—Indexing settings-

Changing the settings below will cause the site index to be rebuilt. The search index is not cleared but systematically updated to reflect the new settings. Searching will continue to work but new content won't be indexed until all existing content has been re-indexed.

The default settings should be appropriate for the majority of sites. Minimum word length to index:

The number of characters a word has to be to be indexed. A lower setting means better search result ranking, but also a larger database. Each search query must contain at least one keyword that is this size (or longer).

3 Simple CJK handling

Whether to apply a simple Chinese/Japanese/Korean tokentzer based on overlapping sequences. Turn this off if you want to use an external preprocessor for this instead. Does not affect other languages.

—Content ranking-

The following numbers control which properties the content search should favor when ordering the results. Higher numbers mean more influence, zero means the property is ignored. Changing these numbers does not require the search index to be rebuilt Changes take effect immediately. Factor Weight

Keyword relevance [j> ! | Recently posted ' 5 i | Number of comments ' 5

Save configuration Reset to defaults

Figure 4-14. The Search module's configuration page

An alternative to setting up cron is to install the Poormanscron module (http://drupal.org/project/poormanscron). This module passes along the task of checking to see whether scheduled events need to happen to your website's visitors, transparently. Each time a visitor hits the website, Poormanscron will check to see whether it needs to do anything new since the last time it ran and, if so, will perform the cron actions. This check triggers events after the page is loaded, so the visitor doesn't know the difference.

Of course, this works only if your site gets regular traffic. But then again, if it doesn't, it probably doesn't matter how often your search index is updated.

Figure 4-15. The Advanced Search page in action
Search Engine Manifesto

Search Engine Manifesto

Rank Your Web Pages HIGH in TOP Search Engine Popular Results For Maximum Exposure. Search Engines and How They Work Search Engines are special sites on the Web that are designed to help people find information stored on other sites.

Get My Free Ebook


Post a comment