Hubs and authorities

From Seo Wiki - Search Engine Optimization and Programming Languages

Jump to: navigation, search

Hubs and Authorities (also known as HITS algorithm) is a scheme used for ranking web pages based on relevance that existed on the web as a precursor to PageRank. The idea behind Hubs and Authorities stemmed from a particular insight into the creation of web pages when the Internet was originally forming; that is, certain web pages, known as hubs, served as large directories that were not actually authoritative in the information that it held, but were used as compilations of a broad catalog of information that led users directly to other authoritative pages. In other words, a good hub represented a page that pointed to many other pages, and a good authority represented a page that was linked by many different hubs.[1]

The scheme therefore assigns two scores for each page: a hub score and an authority score.

Contents

History

In Journals

In the past, many methods were used to rank the importance of scientific journals. An method that was once used was Garfield's impact factor. However, many journals such as Science and Nature are filled with numerous citations, making these magazines have very high impact factors. Thus, if we are compare two more obscure journals which have received roughly the same number of citations as one another, and we discover that one of these journals has received many citations from Science and Nature, then we want the journal with more citations from Science and Nature to be ranked higher. In other words, it is better to receive citations from an important journal than from an unimportant one.[2]

On the Web

This phenomenon also occurs in the Internet. Counting the number of links to a page can give us a general estimate of its prominence on the Web, but a page with very few incoming links may also be prominent, if two of these links come from the home pages of Yahoo! or Google or MSN. Thus, because these sites are of very high importance but are also Search Engines, there can be very irrelevant results.

Hubs

Hubs are highly-valued lists for a given query. For example, a directory page from a major encyclopedia or paper that links to many different highly-linked pages would typically have a higher hub score than a page that links to relatively few other sources.

Authorities

Authorities are highly endorsed answers to the query. A page that is particularly popular and linked by many different directories will typically have a higher authority score than a page that is unpopular.

The Algorithm

To begin the ranking, <math> \forall p </math>, <math>auth(p) = 1</math> and <math>hub(p) = 1</math>. We consider two types of updates: Authority Update Rule and Hub Update Rule. In order to calculate the hub/authority scores of each node, repeated iterations of the Authority Update Rule and the Hub Update Rule are applied. A k-step application of the Hub-Authority algorithm entails applying for k times first the Authority Update Rule and then the Hub Update Rule.

Authority Update Rule

<math>\forall p</math>, we update <math>auth(p)</math> to be:

<math>\displaystyle\sum_{i=1}^n hub(i)</math>

where n is the total number of pages connected to p and i is a page connected to p. That is, the Authority score of a page is the sum of all the Hub scores of pages that point to it.

Hub Update Rule

<math>\forall p</math>, we update <math>hub(p)</math> to be:

<math>\displaystyle\sum_{i=1}^n auth(i)</math>

where n is the total number of pages p connects to and i is a page which p connects to. Thus a page's Hub score is the sum of the Authority scores of all its linking pages

Normalization

The final hub-authority scores of nodes are determined after infinite repetitions of the algorithm. As directly and iteratively applying the Hub Update Rule and Authority Update Rule leads to diverging values, it is necessary to normalize the matrix after every iteration. Thus the values obtained from this process will eventually converge.[3]

Notes

  1. "Introduction to Information Retrieval" (HTML). Cambridge University Press. 2008. http://nlp.stanford.edu/IR-book/html/htmledition/hubs-and-authorities-1.html. Retrieved 2008-11-09. 
  2. Kleinberg, Jon (1999-12). "Hubs, Authorities, and Communities". Cornell University. http://www.cs.brown.edu/memex/ACM_HypertextTestbed/papers/10.html. Retrieved 2008-11-09. 
  3. von Ahn, Luis (2008-10-19). "Hubs and Authorities" (PDF). 15-396: Science of the Web Course Notes. Carnegie Mellon University. http://www.scienceoftheweb.org/15-396/lectures/lecture13.pdf. Retrieved 2008-11-09. 
Personal tools

Served in 0.209 secs.