Figure 21: Normalizing word counts according to Zipf's law.

Note that more frequent words will have a higher count value, thus making them less valuable in the index. Normally words like "and", "the", and "are" will naturally have very high values due to their high frequency in the English language, whereas on most websites, words like "badgers" and "ponies" would have low counts. This limited sample gets things a bit backwards because of the author's obsession with ponies and badgers ;-)

