Chapter 4 — Markov Chains
Always buy an unowned property if it is an orange property (always block this group if you can).
Before tackling the central topic of this part, namely PageRank(s), it seems necessary to provide a theoretical review of the techniques that we will use throughout the following pages.
Andrey Markov (1856–1922), a Russian mathematician, is known for having defined and studied memoryless discrete stochastic processes, also known as Markov chains. This tool from the beginning of the last century is fundamental for understanding the idea of PageRank, which is why we will briefly recall the essential results1.
4.1 Definitions
A discrete random, or stochastic, process is a set of random variables , with . can for instance represent the position on the game board of a Monopoly player at time .
Discrete Markov chains are special cases of discrete stochastic processes with discrete values whose future depends only on the present (not on the past): the states preceding the present state play no role. If we call the set of values (or states) that the variables can take, the mathematical characterization of a Markov chain is the equality, for all ,
In other words, the probability of being in state at time depends only on the value taken at time , and not on earlier values. A priori, this probability may not be the same depending on the time considered. The Markov chain can therefore be defined by the transition probabilities between two states at time :
Throughout this thesis, we will restrict ourselves to the case where the transition probabilities do not depend on the time considered. The corresponding Markov chain is then said to be homogeneous.
If is finite (we will then take ), it is convenient to consider the transition matrix . is a row-stochastic matrix, meaning that . This is due to the fact that the process is completely closed, and that if one is in state at time , then one will be in at time .
The transition matrix allows one to obtain the evolution of our process. Indeed, if we call the vector representing the state distribution at time (), then the following proposition gives as a function of and .
Proposition 4.1.
, where is the transpose operator.2
Proof.
It suffices to show that, for any , we have ; Proposition 4.1 is then merely the application of a straightforward induction. The desired result follows from the fact that:
□
4.2 Graph side, matrix side
We have just seen that stochastic matrices are both an elegant and compact means of describing the evolution of a Markov chain, but this is not the only one. The representation in terms of a weighted directed graph is also very useful, and it is convenient to be able to switch from one to the other as needed.
To any matrix of size , it is possible to associate a weighted directed graph with vertices, whose edges are the set of pairs such that , each weighted by its associated coefficient .
Conversely, any weighted directed graph can correspond to the matrix defined by:
In the case of a graph whose edges are unweighted, by considering them as implicitly having weight , one recovers the adjacency matrix.
Sometimes, the passage from the matrix viewpoint to the graph representation amounts to a simple rewriting: thus, if the characterization of a stochastic matrix3 is
that of a graph representing a homogeneous Markov chain is
However, it sometimes happens that the two approaches correspond to genuinely two subtly different views of the same problem. Thus, saying that a nonnegative matrix is irreducible amounts to saying that
At the graph level, for the corresponding graph , this amounts to saying that for every pair of vertices , there exists a path (of length ) connecting to . In other words, the irreducibility of the matrix translates into the strong connectivity of the graph.
Finally, let us mention that a Markov chain is said to be ergodic if the corresponding matrix is irreducible and aperiodic.
4.3 Evolution of a homogeneous Markov chain
In order to study the long-term evolution of a Markov chain, one may ask whether convergence properties can be obtained. This is indeed guaranteed by the following theorem.
Theorem 4.1.
Let be a stochastic matrix.
- The spectral radius of is , and it is an eigenvalue.
- If is irreducible, then there exists a unique probability vector that is a right eigenvector of for the eigenvalue , and is strictly positive, meaning that all its components are strictly positive.
- If is irreducible and aperiodic, then all eigenvalues other than have modulus strictly less than .
Proof.
-
is an eigenvalue, associated with the eigenvector .
Moreover, if one considers an arbitrary vector , we always have
with the convention .
This implies that every eigenvalue of is at most .
-
If is irreducible, then we are within the framework of the Perron-Frobenius theorem (see Appendix A), which ensures that there exists, up to scaling, a unique eigenvector4 of for the eigenvalue . Since this vector is strictly positive, after normalization it can be considered as a probability vector.
-
By the Perron-Frobenius theorem, if is an eigenvalue of satisfying , then is a -th root of unity, where is the cyclicity of . If is aperiodic, then necessarily . All eigenvalues of other than therefore have modulus strictly less than 1.
□
By Theorem 4.1, it is now possible to determine the asymptotic behavior of a homogeneous Markov chain.
Theorem 4.2.
Let be an irreducible aperiodic stochastic matrix of size representing a homogeneous Markov chain. If we call the right probability eigenvector of for the eigenvalue (whose existence and uniqueness are guaranteed by the Perron-Frobenius theorem), then
where is the row vector of size consisting entirely of s.
Proof.
This is a simple application of the power method, or Jacobi method. Indeed, since is an eigenvalue of dimension , one can decompose on the eigenspace spanned by on one hand and on the space associated with the other eigenvalues on the other hand (which is not necessarily an eigenspace). Thus, there exists an invertible change-of-basis matrix such that
where is a matrix with spectral radius strictly less than 1 (the eigenvalues of are those of except ). The spectral radius of implies that converges geometrically to . therefore converges geometrically to:
This matrix, which we will call , is in fact a projection, since . The projection space is one-dimensional, as the matrix has rank . Since by passing to the limit, we have , the projection line is the one spanned by . is also a stochastic matrix (as a limit of stochastic matrices). In particular, if is the certain probability vector at , defined by , then
hence .
□
The asymptotic behavior of the associated homogeneous Markov chain is then given by Corollary 4.1:
Corollary 4.1.
Let be an arbitrary initial probability distribution. The state distribution at time , , converges, as tends to infinity, to .
Proof.
For any probability distribution , we have .
□
For the reader wishing to become familiar with the applications of the asymptotic study of Markov chains, the following section is devoted to a brief analysis of probabilities in the game of Monopoly (registered trademark).
4.4 Intermezzo: Monopoly™ According to Markov
| No. | Name | Group | No. | Name | Group |
| 0 | Go | 21 | Matignon | red | |
| 1 | Belleville | brown | 22 | Chance Card | |
| 2 | Community Chest | 23 | Malesherbes | red | |
| 3 | Lecourbe | brown | 24 | Henri-Martin | red |
| 4 | Income Tax | 25 | Gare du Nord | ||
| 5 | Gare Montparnasse | 26 | Saint-Honoré | yellow | |
| 6 | Vaugirard | light blue | 27 | Bourse | yellow |
| 7 | Chance Card | 28 | Water Works | ||
| 8 | Courcelles | light blue | 29 | La Fayette | yellow |
| 9 | République | light blue | 30 | Go to Jail | |
| 10 | Just Visiting | 31 | Breteuil | green | |
| 11 | La Villette | purple | 32 | Foch | green |
| 12 | Electric Company | 33 | Community Chest | ||
| 13 | Neuilly | purple | 34 | Capucines | green |
| 14 | Paradis | purple | 35 | Gare Saint-Lazare | |
| 15 | Gare de Lyon | 36 | Chance Card | ||
| 16 | Mozart | orange | 37 | Champs-Élysées | dark blue |
| 17 | Community Chest | 38 | Luxury Tax | ||
| 18 | Saint-Michel | orange | 39 | Rue de la Paix | dark blue |
| 19 | Pigalle | orange | 40 | Jail | |
| 20 | Free Parking |
A question to ask is: what are the chances of landing on a given square? If certain squares are more likely than others, one can easily see that they will be of greater strategic interest. The evolution of a player’s position over successive dice rolls can be viewed as a Markov chain. Ian Stewart [Ste96] associates to each square-state 11 possible transitions corresponding to the possible outcomes of a dice roll, from 2 to 12. The probability of each transition is that of obtaining the result with two dice. Ian Stewart concludes that the stochastic matrix representing the Markov chain is circulant, and that the asymptotic probability distribution is the uniform distribution. In fact, if one looks more closely at the rules, one notices that not all states are equivalent, and that the limiting probability distribution is not necessarily uniform.
4.4.1 Brief Reminder of the Rules and Notation
By convention, we consider that there are 41 squares: from the Go square (number 0) to the Rue de la Paix square (number 39), with the Jail square having number 40 and the Just Visiting square having number 10. Table 4.1 summarizes the different squares, with the name and possible color group.
A game begins on the Go square. At each turn, the player rolls two dice. After three consecutive doubles, the player goes to jail. If the player lands on a Chance or Community Chest square, they draw a card from the corresponding pile, and this draw is possibly followed by an immediate effect on their position. When in jail, one can get out for free by rolling doubles within the three turns following imprisonment; otherwise, one must pay to get out. One can also pay to get out before the end of the three turns.
Here is the detailed list of Chance cards: 1 sends to jail, 1 sends to Avenue Henri-Martin, 1 sends to Boulevard de la Villette, 1 sends to Rue de la Paix, 1 sends to Gare de Lyon, 1 sends to the Go square, 1 Go back three spaces. There are 9 other Chance cards that have no effect on position.
Here is the detailed list of Community Chest cards: 1 Return to Belleville, 1 sends to jail, 1 sends to the Go square, 1 option to draw a Chance card (alternative with a fine). There are 12 other Community Chest cards that have no effect on position.
4.4.2 Transition Matrix
Because of the rules, not all states have the same transitions: thus, any transition to square 30 (Go to Jail) must in fact be replaced by a transition to square 40 (Jail). Similarly, for any transition to the Chance or Community Chest squares, the possible redirections must be considered. There is also the problem of doubles: the stochastic process uses memory (number of turns in jail or number of consecutive doubles already rolled) and is therefore not a true Markov process. But since the memory is finite (3 rolls), one can reduce it to a memoryless process by considering a space with 123 states5: 120 states of the form representing being on square having already rolled consecutive doubles, and 3 jail states representing the three turns one can spend in jail. It should also be noted that if one pays, as one often has an interest in doing early in the game, one spends only one turn in jail, and the transitions are therefore modified. One must therefore consider the transitions for a jail strategy and those for a freedom strategy6.
One thus arrives at writing, for each of the 2 strategies considered, a stochastic matrix describing the strategy in question. Figure 4.2 thus graphically represents the matrix corresponding to the jail strategy.
4.4.3 Asymptotic Probabilities and Conclusion
Once the transition matrix is computed, thanks to Corollary 4.1, we know that it suffices to iterate ( being for example the uniform distribution) to obtain convergence to the asymptotic distribution. One then only needs to return to the space of squares to have usable results, which are shown in Figure 4.3. To complete this study, one would now need to take into account the sale prices and rents to obtain a mean time to return on investment, without forgetting to consider purchasing power, but this is no longer within the scope of Markov matrices7. Let us conclude with just these few remarks:
- The Go to Jail square has zero probability, since one does not stay there.
- Similarly, the Chance and Community Chest squares have fairly low probability, because of the immediate displacement cards.
- The second and third quarters of the board generally have higher probabilities, because of the exit from jail. This confers a definite interest to the properties located there (the orange and red groups in particular).
- Paradoxically, one is more likely to land on the Electric Company by choosing to stay in jail. Why this counterintuitive result? With the freedom strategy, the probability of landing on it upon leaving jail is . With the jail strategy, this probability becomes …
4.5 (Sub-)Stochastic Matrices: General Case
In the following chapters, we will sometimes deal with matrices exhibiting periodicity, or that are non-irreducible, or even sub-stochastic8, where the or is not necessarily exclusive. We therefore propose to study the viability of Corollary 4.1 under these different hypotheses.
4.5.1 Non-Irreducible Matrices
Let us first consider the case where is stochastic but not irreducible. This means that the corresponding graph is not strongly connected. Let us then consider the decomposition into strongly connected components of : . Each component has exactly one of the two following properties:
- Either . The component and its states are then called transient.
- Or . The component and its states are then called recurrent.
If one quotients by its strongly connected components, the reduced graph gives a partial order on the components (since there are no circuits) whose recurrent components are the maxima.
For example, in the graph of Figure 4.4, there are four strongly connected components, , , , and . and are transient, while and are recurrent.
We will now reorder the states as follows: first the transient states (all components combined), then the states of a first recurrent strongly connected component, …, up to the states of the last recurrent strongly connected component. In this rearrangement of states, the associated stochastic matrix (which we will continue to call ) can now be written:
where is a sub-stochastic, non-stochastic matrix of size , is a nonnegative nonzero matrix of size , and the are irreducible stochastic matrices of size .
Theorem 4.3.
Let be a stochastic matrix reduced according to its transient and recurrent strongly connected components. is of the form
where is a sub-stochastic, non-stochastic matrix of size , is a nonnegative nonzero matrix of size , and
the being irreducible stochastic matrices of size .
If all the matrices are aperiodic, then the iterated powers of converge. More precisely, we have:
with
where is the probability vector of size satisfying , and
Corollary 4.2.
For any probability distribution on , converges as tends to infinity, and the limit vector is a probability distribution belonging to the -dimensional space spanned by the canonical embedding of the into .
Remark 4.1.
When not all the are aperiodic, there is a priori no convergence. The technique that we will see in Section 4.5.2 nevertheless allows one to ensure convergence at low cost.
Proof.
We first observe that
Lemma 4.1.
and the convergence is geometric. Moreover, is invertible, and its inverse equals .
Indeed, let us examine the structure of more closely. We will reorder the states by strongly connected components, starting with those that, in the quotient graph , have no incoming edges (ultra-transient components), and sorting by distance from the ultra-transient components. The matrix is then of the form:
where the are irreducible sub-stochastic, non-stochastic matrices (by convention, the one-dimensional matrix is considered irreducible). By Section 4.5.3, each matrix on the main diagonal has a spectral radius strictly less than . Since the structure of is block upper triangular, the spectral radius of is . This ensures that converges geometrically to .
Since is not an eigenvalue of , is invertible. Now, for all , we have:
hence
We deduce that
which completes the proof of the lemma.
Let us return to our proof. It is now established that . With the aperiodicity hypothesis on the , we also have .
It remains to prove that
Now, we have
The first term converges to . As for the second, we have:
The first of the two right-hand terms tends to because of the geometric convergence of to , the second because of the (also geometric) convergence of . This ensures the convergence to of the left-hand term. Q.E.D.
□
4.5.2 Periodic Matrices
If a stochastic matrix is periodic, there is a priori no convergence. One can for example think of the circulant matrix
The successive iterates of describe an orbit of size corresponding to the -th roots of unity, and in particular there is no convergence in the classical sense, although there exists convergence in the Cesaro sense ( converges).
Cesaro convergence could allow us to recover the eigenvector direction associated with the maximal positive eigenvalue, but this is not necessary: as shown by Theorem 4.4, it is possible to reduce to “classical” convergence.
Theorem 4.4.
Let be an irreducible stochastic matrix of size , possibly periodic. Let be the unique probability vector such that . For all , we have
Corollary 4.3.
Let be an arbitrary probability vector. If we set , then
Proof.
The matrix defined by is stochastic, irreducible, and aperiodic, due to the presence of self-loops of length . therefore converges to , where is the unique probability distribution satisfying . Since , we have .
Q.E.D.
□
4.5.3 Sub-Stochastic Matrices
The case of strictly sub-stochastic matrices seems a priori simple to resolve:
Theorem 4.5.
Let be a sub-stochastic, non-stochastic, irreducible matrix of size . Then the iterated powers of this matrix tend to :
Proof.
is dominated by and not equal to an irreducible stochastic matrix. By part (d) of the Perron-Frobenius theorem (see Appendix A), its spectral radius is strictly less than , which ensures the result, namely, if we call the spectral radius, convergence dominated by a geometric sequence with ratio .
□
Remark 4.2.
A sub-stochastic but non-stochastic matrix corresponds to an ill-defined Markov chain, in the sense that not all possible transitions have been specified. By analogy with automata, we will speak of an incomplete Markov chain9. In this case, for any probability distribution , , a result that is unsatisfactory in terms of useful information. This problem is crucial in the context of PageRank computations, and the solutions for “completing” a sub-stochastic matrix will be given in more detail in Chapter 5.
Remark 4.3.
Theorem 4.5 can in fact be applied to any sub-stochastic matrix that is dominated by and not equal to an irreducible stochastic matrix. We will call such matrices sub-irreducible matrices.
However, as we will see in Chapter 5, the study of the maximal eigenvalue of a sub-irreducible matrix and its associated eigenspace is important for the study of PageRank, which is why we will develop this a bit further.
Maximal Eigenvalue and Associated Eigenspace of a Nonnegative Matrix
Theorem 4.6.
Let be a nonnegative matrix10.
Let be the decomposition into strongly connected components of the graph associated with .
We define the spectral radius of a component as the spectral radius of . We will call a pseudo-recurrent component of any component satisfying:
- is maximal: , .
- the components accessible from have a strictly smaller spectral radius: 11, .
Then:
- The spectral radius of is equal to the maximum spectral radius of the strongly connected components of .
- There exists a maximal eigenvalue that is positive (it is the only one if the pseudo-recurrent components are aperiodic). The dimension of the associated eigenspace is then equal to the number of pseudo-recurrent components.
- If is a pseudo-recurrent component of , there exists a nonnegative eigenvector with support associated with the maximal eigenvalue.
Corollary 4.4.
If there exists a single pseudo-recurrent component , and if all vertices are accessible from this component (), then the eigenspaces associated with the maximal eigenvalues12 are one-dimensional, and there exists a nonnegative eigenvector with support associated with the maximal positive eigenvalue. We then say that is pseudo-irreducible.
Proof.
The proof is in fact very similar to that of Theorem 4.3 for non-irreducible stochastic matrices, even though it is no longer possible to have a convergence result for the powers of . Thanks to the partial order induced by the quotient graph of the strongly connected components, it is possible to make block upper triangular according to the strongly connected components: up to choosing the right permutation, one can write
where the are the transition matrices between components and .
This triangular decomposition ensures that the spectral radius of is equal to the maximum spectral radius of the different components . In fact, by applying the Perron-Frobenius theorem to each of the strongly connected components, it turns out that there exists an eigenvalue equal to the spectral radius, and that its multiplicity equals the number of components with maximal radius.
If is pseudo-recurrent, then the multiplicity of in is . There therefore exists, up to scaling, a unique eigenvector associated with in . Since is stable under , the embedding of into is an eigenvector of . By construction, we also have . By the Perron-Frobenius theorem applied to , the components of are strictly positive up to scaling. Step by step, the equality shows that is strictly positive.
For each pseudo-recurrent component of , there therefore exists a nonnegative eigenvector with support associated with .
It remains to show that there is no eigenvector associated with outside of the space spanned by the eigenvectors associated with the pseudo-recurrent components. For this, it suffices to observe that any non-pseudo-recurrent component with maximal radius generates a triangular structure in the space associated with : if there exists a pseudo-recurrent component with , then contains (in a suitable basis) a block of the form , with , which shows that it is not possible to associate an eigenvector of to .
Q.E.D.
□
4.5.4 General Case: Conclusion
We have just seen that one can arrange to find a convergence algorithm even if the matrix is neither aperiodic nor irreducible. Let us simply note that in the latter case, there is no uniqueness. On the other hand, if the matrix is sub-stochastic but not stochastic, the asymptotic vector will be zero everywhere except on possible recurrent strongly connected components within which the matrix is stochastic. The canonical completion, which consists of adding a “sink” state, is not satisfactory because it does not fundamentally change the result: the “sink” state absorbs the probabilities lost at the sub-stochastic components, but the values on the states other than the sink state are not modified. Fortunately, the various PageRank computation algorithms that we will now study will provide us with other completion methods to reliably find a vector having all the desired properties, which will be called the PageRank vector.
- 1For more comprehensive information on Markov’s work, the reader may refer to [Sal96, She88].
- 2Throughout this thesis, we will make extensive use of the transpose operator. Why not work directly with the matrix and thus avoid introducing ? First, because it is more comfortable to consider that a coefficient represents the action of on rather than that of on . Second, because my background from preparatory classes makes me prefer working with column vectors rather than row vectors. Finally, because this is generally the standard notation in the existing literature.
- 3When not otherwise specified, by stochastic matrix we mean a row-stochastic matrix, that is, a nonnegative matrix such that the sum of each of its rows equals .
- 4In fact, there are two eigenvectors, a right one and a left one, the left one of being the right one of and vice versa.
- 5An alternative solution, proposed by [Col], consists of estimating for each square the probability of having arrived there by 2 consecutive doubles.
-
6Other factors can also alter the transition probabilities:
- The Draw a Chance Card or Pay a Fine of… card, which offers two strategies.
- The Get Out of Jail Free cards, which, if kept by the players, slightly increase the probabilities of drawing a displacement card.
The effect of these variations being relatively small, we allow ourselves not to take them into account here.
- 7For an idea of the results obtained, see [Col]. Note that the results correspond to the international Monopoly game, which has different cards from the French version.
- 8In this case, one can no longer a priori speak of an associated Markov chain.
- 9Indeed, just as with automata, it is possible to complete our Markov chain by adding a sink state receiving the stochastic deficiency of the other states, and pointing surely to itself.
- 10Although the study we are about to undertake is primarily of interest from the point of view of sub-irreducible matrices, it is in fact valid for any nonnegative matrix.
- 11For , the filter of , denoted , is the set of vertices of accessible from .
- 12The plural is used here for possible periodicities. In the aperiodic case, there is of course only one maximal eigenvalue.