Thursday, July 24, 2008

PageRank: What Google deems important

The heart of the Google algorithm is the very same link-based system developed at Stanford and is called PageRank (after Larry Page, its inventor, rather than after the pages themselves). Google explains PageRank in the following way:
PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page’s value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B...
In a sense, then, PageRank is like a giant electronic voting system. The page that gets the most votes gets awarded the highest PageRank (on a scale of 0–10). So, grossly oversimplifying, simple importance is determined by link quantity.

This is not the whole story, however. Google goes on to explain:
Google looks at considerably more than the sheer volume of votes, or links a page receives; for example, it also analyzes the page that casts the vote. Votes cast by pages that are themselves “important” weigh more heavily and help to make other pages “important.” Using these and other factors, Google provides its views on pages’ relative importance.
To continue the voting analogy, Google is not a first-past-the-post system. Every vote is not equal. If you get a vote from another site that has already garnered many votes of its own, this will carry a greater weighting than a vote from a relative unknown. So, to complete the picture, relative importance (or PageRank) is determined by both link quantity and link source importance. I explore PageRank in more detail in the tracking and tuning section.

No comments: