Suppose a frog lives alone in a small pond with two Lilly pads and we record where he is every 5 minutes. At each time point the frog (who is disabled and cannot swim but can only leap from pad to pad) is on one of the Lilly pads (state space consisting of A and B). Between the observations (every 5 minutes) the frog may move to the other Lilly pad or stay where he is.
What if we know the frog is on Lilly pad A at the initial time, and we want to know the probability that he is on pad A in 10 minutes?
We can model this scenario as follows. Our initial model suggests that during the first interval we observe (time 0 to time 1 (5 minutes)) if the frog is on pad A at time 0, the probability of the frog staying on pad A is 0.8, the probability of the frog going to pad B is 0.2 if the frog is on pad A. If the frog is on pad B, the probability of staying on pad B is 0.6 and the probability of the frog going from pad B to pad A is 0.4. This model is depicted graphically in Figure 10-6.
B A
0.8
0.2
0.6
0.4
Figure 10-6: Transition Probabilities for Frog’s First Move
Alternatively, we can use a matrix to model this situation for the probabilities of transition from state space 1 (time 0=0 minutes) to state space 2 (time 1=5 minutes). We call this a transition matrix. The row is the starting location and the column is the ending location for each move. The matrix below is for transition probabilities for the first move given the initial state
Transition matrix for initial state =
6 . 0 4 . 0
2 . 0 8 . 0
To determine how he could be on pad A in 2 time periods given he is on pad A in the initial state, we can observe that in order for this to happen, there are two possibilities. The first is that the frog is on pad A stays on pad A during the first and second transitions. The second is that the frog goes from pad A to pad B and then back to pad A.
The total probability of being on pad A after two time periods given he started in pad A is the probability of the mutually exclusive events (disjoint events) of the first and second possibilities. Each transitional event is independent (frog can go from A to A or A to B, these events are independent) so we can multiply them. Notice that in the matrix this is the first row times the first column.
P(Frog on A after two time intervals)
= P(AA)P(AA)+P(AB)P(BA)
=(0.8*0.8)+(0.2*0.4)
=0.64+0.08=0.72
To get the next transition matrix (t=1 to t=2) we can use R to multiply the first transition matrix by itself, using the logic described above.
> frog1<-matrix(c(0.8,0.4,0.2,0.6),nrow=2,ncol=2)
> frog1
[,1] [,2]
[1,] 0.8 0.2 [2,] 0.4 0.6
> frog2<-frog1%*%frog1
> frog2
[,1] [,2]
[1,] 0.72 0.28 [2,] 0.56 0.44
Figure 10-7 depicts graphically the new transition probabilities.
Figure 10-7: Transition Probabilities for Frog’s First Two Moves
Let’s consider the longer-term behavior of the transition probability matrix (which can be referred to as “the chain”). As we increase the powers of the transition matrix (time 3, time 4….) a peculiar thing happens….
> frog3<-frog2%*%frog1
> frog3
[,1] [,2]
[1,] 0.688 0.312 [2,] 0.624 0.376
> frog4<-frog3%*%frog1
> frog4
[,1] [,2]
[1,] 0.6752 0.3248 [2,] 0.6496 0.3504
> frog5<-frog4%*%frog1
> frog6<-frog5%*%frog1
> frog7<-frog6%*%frog1
> frog8<-frog7%*%frog1
> frog9<-frog8%*%frog1
> frog10<-frog9%*%frog1
> frog10
[,1] [,2]
[1,] 0.6667016 0.3332984 [2,] 0.6665968 0.3334032
> frog11<-frog10%*%frog1
> frog12<-frog11%*%frog1
> frog13<-frog12%*%frog1
> frog14<-frog13%*%frog1
> frog15<-frog14%*%frog1
> frog15
[,1] [,2]
[1,] 0.666667 0.3333330 [2,] 0.666666 0.3333340
> frog16<-frog15%*%frog1
> frog17<-frog16%*%frog1
> frog18<-frog17%*%frog1
> frog19<-frog18%*%frog1
> frog20<-frog19%*%frog1
> frog20
[,1] [,2]
[1,] 0.6666667 0.3333333 [2,] 0.6666667 0.3333333
After around 20 intervals the transition matrix no longer changes. At this point, the frog’s probability of going from A to A is the same as going from B to A, and B to B is the same as A to B, as depicted in Figure 10-8.
B A
0.666
0.333
0.333
0.666
Figure 10-8: Transition Matrix for Frog’s First 20 Jumps
After 20 jumps the frog’s position is in a so-called stationary distribution and the chain has converged (ended) with this stationary distribution.
In the beginning we didn’t really mention a “starting probability distribution”
for the probability of the frog being in pad A or B at the beginning of the chain.
We specified only the transition probabilities for his moving from pad to pad.
Let’s say there is an equal probability (p=0.5 for starting on A or B). We can update this distribution by multiplying it by each transition matrix.
> start0<-matrix(c(0.5,0.5),nrow=1,ncol=2)
> start0 [,1] [,2]
[1,] 0.5 0.5
> start1<-start0%*%frog1
> start1 [,1] [,2]
[1,] 0.6 0.4
> start2<-start1%*%frog2
> start2
[,1] [,2]
[1,] 0.656 0.344
> start3<-start2%*%frog3
> start3
[,1] [,2]
[1,] 0.665984 0.334016
By the third iteration, the starting probability distribution is converging to the stationary distribution.
What if we used a different starting probability distribution? Let’s use p=0.1 for the frog starting on pad A and p=0.9 for the frog starting on pad B and see what happens when we update this with the transition matrices:
> altstart0<-matrix(c(0.1,0.9),nrow=1,ncol=2)
> altstart0 [,1] [,2]
[1,] 0.1 0.9
> altstart1<-altstart0%*%frog1
> altstart1 [,1] [,2]
[1,] 0.44 0.56
> altstart2<-altstart1%*%frog2
> altstart2
[,1] [,2]
[1,] 0.6304 0.3696
> altstart3<-altstart2%*%frog3
> altstart3
[,1] [,2]
[1,] 0.6643456 0.3356544
This alternative starting distribution also converges to the stationary probabilities. This can be interpreted as the process eventually “forgets” the starting condition and no matter where it starts, converges to some stationary distribution given the initial transition probabilities.
Important concepts which are illustrated with this simple example are the concept of a Markov Chain as a probabilistic model, the concepts of states of the Markov chain (Lilly pad A or B at a given time), understanding transition probabilities between states and how to use a matrix to display transition
probabilities and how to mathematically (using R) manipulate the matrix to compute transition probabilities for k>1 subsequent states and understanding the idea of convergence to a stationary state.
Let’s investigate these concepts with more mathematical detail using a second, slightly more complex model involving a DNA sequence.