Searchmodule searchindex

<?php

Stags

■ array('hi' -> 25,

'h2' -> 18,

'h3' -> 15,

'h4' -> 12,

'h5' -> 9,

'h6 ' -> 6,

'u' -> 3,

'b' -> 3,

'i' -> 3,

'strong' ■> 3,

'em' -> 3,

'a' -> 10);

?>

Figure 2: The HTML tags that have value in the search index.

Figure 2: The HTML tags that have value in the search index.

The array keys are the tag names, and the values are weights that will affect the scoring of the words associated with these tags.

The next step in search_index is to prepare the text for tokenization by inserting spaces in between tags and texts:

Spaces inserted between tags and text

Spaces inserted

Figure 3: Prepare to be tokenized

Any tags not in the $tags variable are removed. The text is then split in such a way that the whole document becomes an array that alternates between text and tag fragments. This is an example of cunning and clever code. For example, the following text gets split into the array in figure 1.

0 0

Post a comment