Don Dodge has an intriquing post on how search engines rank pages for on page factors which he equates to the long tail of search; Don knows a great deal about on page ranking factors as he used to work for AltaVista as director on Engineering.
"Amazingly, just 10 words account for 25% of the words found in a typical paragraph."
Using The Oxford English Corpus, the largest English language corpus of its type and If you were to read through the corpus, one word in four would be an example of one of these ten lemmas (below) care of Alex Barnet’s blog.
| Vocabulary size (no. lemmas) | % of content in OEC | Example lemmas |
|---|---|---|
| 10 | 25% | the, of, and, to, that, have |
| 100 | 50% | from, because, go, me, our, well, way |
| 1000 | 75% | girl, win, decide, huge, difficult, series |
| 7000 | 90% | tackle, peak, crude, purely, dude, modest |
| 50,000 | 95% | saboteur, autocracy, calyx, conformist |
| >1,000,000 | 99% | laggardly, endobenthic, pomological |
This is what one long-tailed tagcloud might look like:
The most basic and important factor in ranking search results is the uniqueness of the words contained in the query. All known words are given a weighting factor with the most rare words given the highest ranking. The most common words are ignored, and some common words typically used in spam are given a negative weighting according to Don Dodge.
Here are some tips Don Dodge came up with to improve organic search results
Hint #1 - Don’t use common words, or words typically associated with spam. Viagra will get you a negative weighting unless you are from the drug industry and have reputable links to your page.
Hint #2 - The frequency of a word on a page, and the position of the word on the page, are very important factors in ranking results. However, use a word too often and it will be detected as spam, and given a huge negative weight. For example, in the early days web masters would repeat a word over and over in the text, place in in the title, meta tags, and even put it on the page in white font so the user wouldn’t see it but the search engine would. The search engines got wise to all these tricks and adjusted their algorithms accordingly.
Hint #3 - Words that appear in the title of a page are more important than the same word in the body of the text. Words that appear in bold are more important than other words.
Hints 1-3 are really standard fare for SEO; working to try to use more unique forms of a word or idea will result in higher rankings overall.
