Thursday, July 24, 2008

How Google Ranks

The route to good Google ranking is to get a high quantity and quality of links to your pages from important, relevant,
and reliable sources.

PageRank: What Google deems important

The heart of the Google algorithm is the very same link-based system developed at Stanford and is called PageRank (after Larry Page, its inventor, rather than after the pages themselves). Google explains PageRank in the following way:
PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page’s value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B...
In a sense, then, PageRank is like a giant electronic voting system. The page that gets the most votes gets awarded the highest PageRank (on a scale of 0–10). So, grossly oversimplifying, simple importance is determined by link quantity.

This is not the whole story, however. Google goes on to explain:
Google looks at considerably more than the sheer volume of votes, or links a page receives; for example, it also analyzes the page that casts the vote. Votes cast by pages that are themselves “important” weigh more heavily and help to make other pages “important.” Using these and other factors, Google provides its views on pages’ relative importance.
To continue the voting analogy, Google is not a first-past-the-post system. Every vote is not equal. If you get a vote from another site that has already garnered many votes of its own, this will carry a greater weighting than a vote from a relative unknown. So, to complete the picture, relative importance (or PageRank) is determined by both link quantity and link source importance. I explore PageRank in more detail in the tracking and tuning section.

Text matching: What Google deems relevant

Ok, let’s assume you have cracked relative importance and have a PageRank of 10. Now your page ranks number one on Google for every single related search undertaken by a user, right? Wrong!

The easiest way to illustrate this is by means of an example. I know that the Guggenheim Museum in New York is a relatively important art gallery, as it is cited as such by many important sources. However, the Guggenheim focuses on modern art, so if I were searching for seventeenth-century landscapes I would be unlikely to find its content relevant to my search.

If, however, I were searching for the art of Piet Mondrian, then Guggenheim’s importance – when combined with its relevance to Mondrian and his work – should absolutely ensure that it appears near the top of the rankings. Try the search “Piet Mondrian” in Google or Yahoo! and you will see that the Guggenheim does indeed feature in the top 10 (although MSN and Ask notably fail to include it there). So how do the people at Google do this? Well, as they themselves put it, the search engine:

goes far beyond the number of times a term appears on a page and examines dozens of aspects of the page’s content (and the content of the pages linking to it) to determine if it’s a good match for your query.


In short, the Guggenheim page about Piet Mondrian ranks so well because many sites about the artist (with lots of text containing the word Mondrian) link to the Guggenheim, often with Mondrian in the link text. See for example www.pietmondrian.org and http://en.wikipedia.org/wiki/Piet_Mondrian.

This sophisticated process is known as text matching. From my years of SEO experience, the most important of all these text-matching factors is link quality, where as many as possible of the links to your pages use your phrases that pay in their anchor text. The closer the anchor text is to your desired keyphrases, the greater the link quality will be. In the next section I will prove to you just how important link quality is.

TrustRank: What Google deems reliable

The more established a site or its links are, the more Google appears to assign “trust” and weighting in its algorithm, often known as TrustRank. Links from well-established sites like the Yahoo! directory, the Open Directory Project, .gov, .edu, and other sites established in the earliest days of the web carry greater source reliability and weighting in the ranking of your site.

You can find out the age of a web domain (and a whole host of other useful facts) by paying a visit to a decent Whois tool. On my forum I list a selection of the best, but for the purposes of this illustration, pay a visit to Domain Tools (http://whois.domaintools.com). Put in the URL you want to investigate like this:

http://whois.domaintools.com/yourdomain.com

The tool returns basic page information, indexed data (from the Open Directory Project and Wikipedia), and server data (including the country in which the IP is based). It also provides registrar data and the full Whois record (including the date on which the domain was first registered).