Chapter II: The speed of sequential asymptotic learning
2.3 The evolution of public belief
Consider a baseline model, in which each agent observes the private signals of all of her predecessors. In this case the public log-likelihood ratio Λβπ‘ would equal the sum
βΛπ‘ =
π‘
βοΈ
π=1
πΏπ.
Conditioned on the state this is the sum of i.i.d. random variables, and so by the law of large numbers we have that the limit limπ‘βΛπ‘/π‘ wouldβconditioned on (say) π =+1βequal the conditional expectation ofπΏπ‘, which is positive.6
Sub-linear public beliefs
Our first main result shows that when agents observe actions rather than signals, the public log-likelihood ratio grows sub-linearly, and so learning from actions is always slower than learning from signals.
Theorem 4. It holds with probability 1 thatlimπ‘βπ‘/π‘ =0.
Our second main result shows that, depending on the choice of private signal distributions,βπ‘ can grow at a rate that is arbitrarily close to linear: given any sub- linear functionππ‘, it is possible to find private signal distributions so thatβπ‘ grows as fast asππ‘.
6In fact,E(πΏπ‘|π=+1)is equal to the Kullback-Leibler divergence betweenπΉ+andπΉβ, which is positive as long as the two distributions are different.
Theorem 5. For anyπ: N β R>0such that limπ‘ππ‘/π‘ = 0 there exists a choice of CDFsπΉβ andπΉ+such that
lim inf
π‘ββ
|βπ‘| ππ‘
> 0 with probability 1.
For example, for some choice of private signal distributions,βπ‘grows asymptotically at least as fast asπ‘/logπ‘, which is sub-linear but (perhaps) close to linear.
Long-term behavior of public beliefs
We next turn to estimating more precisely the long-term behavior of the public log-likelihood ratioβπ‘. Since signals are unbounded, agents learn the state, so that βπ‘ tends to+βifπ =+1, and toββifπ =β1. In particularβπ‘ stops changing sign from someπ‘on, with probability 1; all later agents choose the correct action.
We consider without loss of generality the case that π = +1, so that βπ‘ is positive from someπ‘on. Thus, recalling (2.1), we have that from someπ‘ on,
βπ‘+1=βπ‘+π·+(βπ‘).
This is the recurrence relation that we need to solve in order to understand the long term evolution of βπ‘. To this end, we consider the corresponding differential equation:
dπ dπ‘
(π‘) =π·+(π(π‘)).
Recall that πΊβ is the CDF of the private log-likelihood ratio πΏπ‘, conditioned on π =β1. We show (Lemma 8) that π·+(π₯) is well approximated byπΊβ(βπ₯)for high π₯, in the sense that
π₯ββlim
π·+(π₯) πΊβ(βπ₯) =1.
In some applications (including the Gaussian one, which we consider below), the expression for πΊβ is simpler than that for π·+, and so one can instead consider the differential equation
dπ dπ‘
(π‘) =πΊβ(βπ(π‘)). (2.2)
This equation can be solved analytically in many cases in whichπΊβ has a simple form. For example, ifπΊβ(βπ₯) = eβπ₯ then π(π‘) =log(π‘+π), and ifπΊβ(βπ₯) = π₯βπ then π(π‘) = ( (π+1) Β·π‘+π)1/(π+1).
We show that solutions to this equation have the same long term behavior as βπ‘, given thatπΊβ satisfies some regularity conditions.
Theorem 6. Suppose thatπΊβandπΊ+are continuous, and that the left tail ofπΊβis convex and differentiable. Suppose also that π: R>0βR>0satisfies
dπ dπ‘
(π‘)=πΊβ(βπ(π‘)) (2.3)
for all sufficiently largeπ‘. Then conditional on π=+1,
π‘ββlim βπ‘ π(π‘) =1 with probability1.
The condition7 on πΊβ is satisfied when the random variables πΏπ‘ (i.e., the log- likelihood ratios associated with the private signals), conditioned onπ=β1, have a distribution with a probability density function that is monotone decreasing for all π₯less than someπ₯0. This is the case for the normal distribution, and for practically every non-atomic distribution one may encounter in the standard probability and statistics literatures.
Gaussian signals
In the Gaussian case,πΉ+is Normal with mean+1 and varianceπ2, andπΉβis Normal with meanβ1 and the same variance. A simple calculation shows thatπΊβ is the Gaussian cumulative distribution function, and so we cannot solve the differential equation (2.2) analytically. However, we can bound πΊβ(π₯) from above and from below by functions of the form eβπΒ·π₯2/π₯. For these functions the solution to (2.2) is of the form π(π‘)=βοΈ
logπ‘, and so we can use Theorem 6 to deduce the following.
Theorem 7. When private signals are Gaussian, then conditioned onπ =+1, lim
π‘ββ
βπ‘ (2β
2/π) Β·βοΈ
logπ‘
=1 with probability 1.
Recall, that when private signals are observed, the public log-likelihood ratio βπ‘ is asymptoticallylinear. Thus, learning from actions is far slower than learning from signals in the Gaussian case.
7By βthe left tail ofπΊβis convex and differentiableβ we mean that there is someπ₯0such that, restricted to(ββ, π₯0),πΊβis convex and differentiable.
The expected time to learn
When private signals are unbounded then with probability 1 the agents eventually all choose the correct actionππ‘ =π. A natural question is: how long does it take for that to happen? Formally, we define thetime to learn
ππΏ =min{π‘ : ππ =π for allπ β₯ π‘},
and study its expectation. Note that in the baseline case of observed signalsππΏ has finite expectation, since the probability of a mistake at timeπ‘ decays exponentially withπ‘.
We first study the expectation ofππΏ in the case of Gaussian signals. To this end we define thetime of first mistakeby
π1=min{π‘ : ππ‘ β π}
ifππ‘ β π for some π‘, and byπ1 =0 otherwise. We calculate a lower bound for the distribution ofπ1, showing that it decays at most as fast as 1/π‘.
Theorem 8. When private signals are Gaussian then for everyπ >0there exists a π >0such that for allπ‘
P(π1 =π‘) β₯ π π‘1+π
Β·
Thusπ1has a very thick tail, decaying far slower than the exponential decay of the baseline case. In particular,π1 has infinite expectation, and so, sinceππΏ > π1, the expectation of the time to learnππΏ is also infinite.
In contrast, we show that when private signals have thick tailsβthat is, when the probability of a strong signal vanishes slowly enoughβthen the time to learn has finite expectation. In particular, we show this when the left tail ofπΊβ and the right tail ofπΊ+ are polynomial.8
Theorem 9. Assume that πΊβ(βπ₯) =πΒ·π₯βπ and that πΊ+(π₯) =1βπΒ·π₯βπ for some π > 0andπ > 0, and for allπ₯greater than someπ₯0. ThenE(ππΏ) < β.
8Recall thatπΊβis the conditional cumulative distribution function of the private log-likelihood ratiosπΏπ‘.
An example of private signal distributionsπΉ+andπΉβfor whichπΊβandπΊ+have this form is given by the probability density functions
πβ(π₯) =



ο£²



ο£³
πΒ·eβπ₯π₯βπβ1 when 1 β€ π₯
0 when β1< π₯ < 1 πΒ· (βπ₯)βπβ1 whenπ₯ β€ β1.
and π+(π₯) = πβ(βπ₯), for an appropriate choice of normalizing constant π > 0. In this caseπΊβ(βπ₯) =1βπΊ+(π₯) = π
ππ₯βπ for allπ₯ >1.9
The proof of Theorem 9 is rather technically involved, and we provide here a rough sketch of the ideas behind it.
We say that there is an upsetat time π‘ if ππ‘β1 β ππ‘. We denote byΞthe random variable which assigns to each outcome the total number of upsets
Ξ =|{π‘ : ππ‘β1 β ππ‘}|.
We say that there is arunof lengthπ from timeπ‘ ifππ‘ =ππ‘+1 =Β· Β· Β· =ππ‘+πβ1. As we will condition onπ =+1 in our analysis, we say that a run from timeπ‘ isgoodif ππ‘ =1 andbadotherwise. A trivial but important observation is that the number of maximal finite runs is equal to the number of upsets, and so, ifΞ = π, and ifππΏ =π‘, then there is at least one run of lengthπ‘/πbefore timeπ‘. Qualitatively, this implies that if the number of upsets is small, and if the time to learn is large, then there is at least one long run before the time to learn.
We show that it is indeed unlikely that Ξ is large: the distribution of Ξ has an exponential tail. Incidentally, this holds foranyprivate signal distribution:
Proposition 8. For every private signal distribution there existπ >0and0 < πΎ <1 such that for allπ > 0
P(Ξβ₯ π) β€ππΎπ.
Intuitively, this holds because whenever an agent takes the correct action, there is a non-vanishing probability that all subsequent agents will also do so, and no more upsets will occur.
9Theorem 9 can be proved for other thick-tailed private signal distributions: for example, one could take different values ofπandπforπΊβandπΊ+, or one could replace their thick polynomial tails by even thicker logarithmic tails. For the sake of readability we choose to focus on this case.
Thus, it is very unlikely that the number of upsetsΞis large. As we observe above, whenΞis small then the time to learnππΏ can only be large if at least one of the runs is long. WhenπΊβhas a thin tail then this is possible; indeed, Theorem 8 shows that the first finite run has infinite expected length when private signals are Gaussian.
However, whenπΊβ has a thick, polynomial left tail of orderπ₯βπ, we show that it is very unlikely for any run to be long: the probability that there is a run of lengthπ decays at least as fast as exp(βππ/(π+1)), and in particular runs have finite expected length. Intuitively, when strong signals are rare then runs tend to be long, as agents are likely to emulate their predecessor. Conversely, when strong signals are more likely then agents are more likely to break a run, and so runs tend to be shorter.
Putting together these insights, we conclude that it is unlikely that there are many runs, and, in the polynomial signal case, it is unlikely that runs are long. ThusππΏ has finite expectation.
Probability of taking the wrong action
Yet another natural metric of the speed of learning is the probability of mistake ππ‘ =P(ππ‘ β π).
Calculating the asymptotic behavior of ππ‘seems harder to tackle.
For the Gaussian case, while we cannot estimate ππ‘ precisely, Theorem 8 immedi- ately implies a lower bound: ππ‘is at leastπ/π‘1+π, for everyπ >0 andπ that depends onπ. This is much larger than the exponentially vanishing probability of mistake in the revealed signal baseline case.
More generally, we can use Theorem 4 to show that ππ‘ vanishes sub-exponentially for any signal distribution, in the sense that
π‘ββlim 1 π‘
logππ‘ =0.
To see this, note that the probability of mistake at time π‘ β1, conditioned on the observed actions, is exactly equal to
min{ππ‘,1βππ‘}; where we recall that
ππ‘ =P(π =+1|π1, . . . , ππ‘β1) = eβπ‘ eβπ‘ +1
is the public belief. This is due to the fact that if the outside observer, who holds belief ππ‘, had to choose an action, she would choose ππ‘β1, the action of the last player she observed, a player who has strictly more information than her. Thus
ππ‘ =E(min{ππ‘,1βππ‘})=E 1
e|βπ‘|+1
,
and since, by Theorem 4,|βπ‘|is sub-linear, it follows that ππ‘is sub-exponential.