Stochastic matrices - Linear Algebra - Mathematical Mathematical Foundations of Computer Networ

CHAPTER 3 Linear Algebra

3.6 Stochastic matrices

DRAFT - Version 3- Stochastic matrices

DRAFT - Version 3-Eigenvalues of a stochastic matrix

99

Thus, after two transitions, the system is in state 1 with probability 0.1125. Note that this probability is larger than the simple probability of staying in state 1 (which is just 0.25 * 0.25 = 0.0625), because it takes into account the probability of transi- tioning from state 1 to state 2 and then back from state 2 to state 1, which has an additional probability of 0.5*0.1 = 0.05.

As this example shows, if A is a stochastic matrix, then the [i,j]^th element of (A^T)² represents the probability of going from state i to state j in two steps. Generalizing, the probability of going from state i to state jin k steps is given by (A^T)^k.

3.6.2 Eigenvalues of a stochastic matrix

We now present three important results concerning stochastic matrices.

First, every stochastic matrix has an eigenvalue of 1. To prove this, consider the stochastic matrix A and column vector x = [1 1 ... 1]^T. Then, Ax = x, because each element of Ax multiplies 1 with the sum of a row of A, which, by definition, sums to 1. Because a matrix and its transpose have the same eigenvalues, the transpose of a stochastic matrix also has an eigenvalue of 1. However, the eigenvector of the transposed matrix corresponding to this eigenvalue need not be (and rarely is) the 1 vector.

Second, every (possibly complex) eigenvalue of a stochastic matrix must have a magnitude no greater than 1. To prove this, consider some diagonal element a_jj. Suppose this element takes the value x. Then, by definition of a stochastic matrix, it must be the case that the sum of the off-diagonal elements is 1-x. From Gerschgorin’s circle theorem, we know that all the eigenvalues lie within a circle in the complex plane centered at x with radius 1-x. The largest magnitude eigenvalue will be a point on this circle (see Figure 2). Although the truth of the proposition is now evident by inspection, we now formally prove the result.

FIGURE 2. Largest possible eigenvalue of a stochastic matrix

Suppose that this point subtends an angle of . Then, its coordinates are (x+(1-x) cos , (1-x)sin ). Therefore, its magnitude is ((x+(1-x) cos )²+ ((1-x)sin )²)^1/2, which simplifies to (x² + (1-x)²+2x(1-x) cos )^1/2. This quantity is maximized when

= 0 so that we merely have to maximize the quantity x² + (1-x)²+2x(1-x). Taking the first derivative with respect to x and set- ting it to zero shows that this expression reaches its maximum of 1 independent of the value of x. So, we can pick x to be a convenient value, such as 1. Substituting = 0 and x = 1 into ((x+(1-x) cos )²+ ((1-x)sin )²)^1/2, we find that the magnitude of the maximum eigenvalue is 1.

Third, it can also be shown, although the proof is beyond the scope of this text, that under some mild assumptions³, only one of the eigenvalues of a stochastic matrix is 1, which also must therefore be its dominant eigenvalue.

0.1125 0.115 0 0.575 0.86 0 0.3125 0 1.0

1 0 0

0.1125 0.575 0.3125

n×n

0 x 1

1-x

x+(1-x)cos

(1-x)sin

θ θ θ

θ θ θ θ

θ θ θ

DRAFT - Version 3- Stochastic matrices

These three facts lead to a remarkable insight. Recall that we compute one step in the state evolution of a stochastic system described by the transition matrix A by multiplying the state vector by A^T. We already know that (under some mild assumptions) every stochastic transition matrix A has a unique eigenvalue of 1. We also know that a matrix and its transpose have the same eigenvalues. Therefore, A^T also has a unique eigenvalue of 1. What if the state vector was the eigenvector of A^Tcor- responding to this eigenvalue? Then, one step in the state evolution would leave the corresponding eigenvector unchanged.

That is, a system in a state corresponding to that eigenvector would never leave that state. We denote this special state by the vector . It is also called the stationary probability distribution of the system. can be found by solving the system of linear equations given by

(EQ 18)

Because every stochastic matrix has an eigenvalue of 1, every stochastic system must have a unique stationary probability distribution, which is the eigenvector corresponding to the unique eigenvalue of 1. Instead of solving a system of linear equations, we can also compute the stationary probability distribution of A using the power method because 1 is also its dominant eigenvalue. We put these results together to state that the power method can be used to compute the stationary probability distribution of a stochastic matrix. In our study of queueing theory, we will study the conditions on the matrix Awhich guar- antee that its stationary probability distribution is reached independent of the initial state of the system. Roughly speaking, these are the conditions under which the matrix A^T is said to be ergodic.

EXAMPLE 25: GOOGLEPAGERANKALGORITHM

The power technique of finding the dominant eigenvector of a stochastic matrix can be used to rank a set of web pages. More precisely, given a set of web pages, we would like to identify certain pages as being more important than others. A page can be considered to be important using the recursive definition that (a) many other pages point to it and (b) the other pages are also important.

The importance of a page can be quantified according the actions of a ‘random web surfer’ who goes from web page i to a linked web pagej with probability a_ij. If a page is ‘important,’ then a random web surfer will be led to that page more often than to other less-important pages. That is, we consider a population of a large number of surfers, then a larger fraction of web surfers will be at a more important page, compared to a less important page. Treating the ratio of the number of web surfers at a page to the total number of surfers as approximating a probability, we see that the importance of a page is just the stationary probability of being at that page.

To make matters more precise, let the matrix A represent the set of all possible transition probabilities. If the probability of the surfer being at page iat some point is p_i then the probability that the surfer is at page i after one time step is A^Tp_i. The dominant eigenvector of A^Tis then the ‘steady state’ probability of a surfer being at page i. Given that A is a stochastic matrix, we know that this dominant eigenvector exists, and that it can be found by the power method.

What remains is to estimate the quantities a_ij.Suppose page i has links to kpages. Then, we set a_ij = 1/k for each page j to which it has a link, and set a_ij = 0 for all other j. This models a surfer going from a page uniformly randomly to one of its linked pages. What if a page has no links? Or if two pages link only to each other? These issues can be approximately mod- elled by assuming that, with constant probability, the surfer ‘teleports’ to a randomly chosen page. That is, if there is a link from page ito page j, a_ij = /n + (1- )/k, where is a control parameter; otherwise a_ij = /n. It can be easily shown that these modified a_ijs form a stochastic matrix, so that we can extract the dominant eigenvalue, and thus the page rank, using the power method. A slightly modified version of this algorithm is the publicly-described algorithm used by Google to rank web pages⁴.

3. These assumptions eliminate matrices such as the identity matrix, which is a degenerate Markov matrix in that all its states are absorb- ing states.

4. A rather curious fact is that Google’s algorithm to rank pages, called the Page rank algorithm, was developed, in part, by its co- founder, Larry Page.

π π

A^Tπ = π

α α α α

DRAFT - Version 3-Eigenvalues of a stochastic matrix

101

Dalam dokumen Mathematical Mathematical Foundations of Computer Networkingof Computer Networking (Halaman 108-111)