Scoring words

The token loop then continues, carrying the updated $score and $tagstack into the next iteration. The next item, in position #2 from Figure 3, is the text Taxonomy. In the text handling portion of the token loop, the first thing that normally happens is the text is split using the function search_index_split. search_index_split does a certain amount of processing on the text, which is covered in its own section below. What is returned is an array of individual words. This array is iterated over and the individual words are inserted into a further array that tracks each unique word for the whole document as well as its accumulated score. This array is the $results array.

In addition to the word => score array that gets built for $results, an accumulated list of all tokenized and processed words is created and stored in the form of the $accum variable. This is used later to build the search_dataset table.

0 0

Post a comment