Université Montpellier II
Doctoral Dissertation
Web Graphs
PageRank-like Importance Measures
Defended on December 8, 2004 before a committee of:
- M. Serge Abiteboul, Directeur de recherche, INRIA — reviewer
- M. Vincent Blondel, Professor, UCL (Belgium) — guest
- M. Pierre Fraigniaud, Directeur de recherche, CNRS — reviewer
- M. Michel Habib, Professor, LIRMM, Montpellier — thesis advisor
- M. Alain Jean-Marie, Directeur de recherche, INRIA — chair
- M. Laurent Viennot, Research Scientist, INRIA — co-advisor
The purpose of this thesis is to apply PageRank-like measures to Web graphs. The first part introduces the Web graphs. First we define the notion of indexable Web, then we give an insight on how big the effective crawls really are. Finally, we notice and use some of the structures that exist on the portions of the Web known as Web graphs.
Then, the second part study deeply the PageRank algorithms. After a remainder on Markov chains theory is given an original classification of PageRank algorithms. From a basic model, we incorporate all the specificities needed to cope with real Web graphs. Lastly, new algorithms are proposed. BackRank uses an alternative random surfer modeling leading to a faster computation. The highly clustered structure of Web graphs allows a PageRank decomposition according to Web sites, and is the reason for introducing the algorithms FlowRank and BlowRank.