The evolution of public belief - The speed of sequential asymptotic learning

Chapter II: The speed of sequential asymptotic learning

2.3 The evolution of public belief

Consider a baseline model, in which each agent observes the private signals of all of her predecessors. In this case the public log-likelihood ratio ˜ℓ_𝑡 would equal the sum

ℓ˜_𝑡 =

𝑡

∑︁

𝜏=1

𝐿_𝜏.

Conditioned on the state this is the sum of i.i.d. random variables, and so by the law of large numbers we have that the limit lim𝑡ℓ˜_𝑡/𝑡 would—conditioned on (say) 𝜃 =+1—equal the conditional expectation of𝐿_𝑡, which is positive.6

Sub-linear public beliefs

Our first main result shows that when agents observe actions rather than signals, the public log-likelihood ratio grows sub-linearly, and so learning from actions is always slower than learning from signals.

Theorem 4. It holds with probability 1 thatlim𝑡ℓ_𝑡/𝑡 =0.

Our second main result shows that, depending on the choice of private signal distributions,ℓ_𝑡 can grow at a rate that is arbitrarily close to linear: given any sub- linear function𝑟_𝑡, it is possible to find private signal distributions so thatℓ_𝑡 grows as fast as𝑟_𝑡.

6In fact,E(𝐿_𝑡|𝜃=+1)is equal to the Kullback-Leibler divergence between𝐹₊and𝐹₋, which is positive as long as the two distributions are different.

Theorem 5. For any𝑟: N → R^>0such that lim𝑡𝑟_𝑡/𝑡 = 0 there exists a choice of CDFs𝐹₋ and𝐹₊such that

lim inf

𝑡→∞

|ℓ_𝑡| 𝑟_𝑡

> 0 with probability 1.

For example, for some choice of private signal distributions,ℓ_𝑡grows asymptotically at least as fast as𝑡/log𝑡, which is sub-linear but (perhaps) close to linear.

Long-term behavior of public beliefs

We next turn to estimating more precisely the long-term behavior of the public log-likelihood ratioℓ_𝑡. Since signals are unbounded, agents learn the state, so that ℓ_𝑡 tends to+∞if𝜃 =+1, and to−∞if𝜃 =−1. In particularℓ_𝑡 stops changing sign from some𝑡on, with probability 1; all later agents choose the correct action.

We consider without loss of generality the case that 𝜃 = +1, so that ℓ_𝑡 is positive from some𝑡on. Thus, recalling (2.1), we have that from some𝑡 on,

ℓ_𝑡+₁=ℓ_𝑡+𝐷₊(ℓ_𝑡).

This is the recurrence relation that we need to solve in order to understand the long term evolution of ℓ_𝑡. To this end, we consider the corresponding differential equation:

d𝑓 d𝑡

(𝑡) =𝐷₊(𝑓(𝑡)).

Recall that 𝐺₋ is the CDF of the private log-likelihood ratio 𝐿_𝑡, conditioned on 𝜃 =−1. We show (Lemma 8) that 𝐷₊(𝑥) is well approximated by𝐺₋(−𝑥)for high 𝑥, in the sense that

𝑥→∞lim

𝐷₊(𝑥) 𝐺₋(−𝑥) =1.

In some applications (including the Gaussian one, which we consider below), the expression for 𝐺₋ is simpler than that for 𝐷₊, and so one can instead consider the differential equation

d𝑓 d𝑡

(𝑡) =𝐺₋(−𝑓(𝑡)). (2.2)

This equation can be solved analytically in many cases in which𝐺₋ has a simple form. For example, if𝐺₋(−𝑥) = e^−𝑥 then 𝑓(𝑡) =log(𝑡+𝑐), and if𝐺₋(−𝑥) = 𝑥^−𝑘 then 𝑓(𝑡) = ( (𝑘+1) ·𝑡+𝑐)^1/(𝑘+1).

We show that solutions to this equation have the same long term behavior as ℓ_𝑡, given that𝐺₋ satisfies some regularity conditions.

Theorem 6. Suppose that𝐺₋and𝐺₊are continuous, and that the left tail of𝐺₋is convex and differentiable. Suppose also that 𝑓: R^>0→R^>0satisfies

d𝑓 d𝑡

(𝑡)=𝐺₋(−𝑓(𝑡)) (2.3)

for all sufficiently large𝑡. Then conditional on 𝜃=+1,

𝑡→∞lim ℓ_𝑡 𝑓(𝑡) =1 with probability1.

The condition7 on 𝐺₋ is satisfied when the random variables 𝐿_𝑡 (i.e., the log- likelihood ratios associated with the private signals), conditioned on𝜃=−1, have a distribution with a probability density function that is monotone decreasing for all 𝑥less than some𝑥₀. This is the case for the normal distribution, and for practically every non-atomic distribution one may encounter in the standard probability and statistics literatures.

Gaussian signals

In the Gaussian case,𝐹₊is Normal with mean+1 and variance𝜎², and𝐹₋is Normal with mean−1 and the same variance. A simple calculation shows that𝐺₋ is the Gaussian cumulative distribution function, and so we cannot solve the differential equation (2.2) analytically. However, we can bound 𝐺₋(𝑥) from above and from below by functions of the form e⁻^𝑐^·^𝑥²/𝑥. For these functions the solution to (2.2) is of the form 𝑓(𝑡)=√︁

log𝑡, and so we can use Theorem 6 to deduce the following.

Theorem 7. When private signals are Gaussian, then conditioned on𝜃 =+1, lim

𝑡→∞

ℓ_𝑡 (2√

2/𝜎) ·√︁

log𝑡

=1 with probability 1.

Recall, that when private signals are observed, the public log-likelihood ratio ℓ_𝑡 is asymptoticallylinear. Thus, learning from actions is far slower than learning from signals in the Gaussian case.

7By “the left tail of𝐺₋is convex and differentiable” we mean that there is some𝑥₀such that, restricted to(−∞, 𝑥₀),𝐺₋is convex and differentiable.

The expected time to learn

When private signals are unbounded then with probability 1 the agents eventually all choose the correct action𝑎_𝑡 =𝜃. A natural question is: how long does it take for that to happen? Formally, we define thetime to learn

𝑇_𝐿 =min{𝑡 : 𝑎_𝜏 =𝜃 for all𝜏 ≥ 𝑡},

and study its expectation. Note that in the baseline case of observed signals𝑇_𝐿 has finite expectation, since the probability of a mistake at time𝑡 decays exponentially with𝑡.

We first study the expectation of𝑇_𝐿 in the case of Gaussian signals. To this end we define thetime of first mistakeby

𝑇₁=min{𝑡 : 𝑎_𝑡 ≠𝜃}

if𝑎_𝑡 ≠ 𝜃 for some 𝑡, and by𝑇₁ =0 otherwise. We calculate a lower bound for the distribution of𝑇₁, showing that it decays at most as fast as 1/𝑡.

Theorem 8. When private signals are Gaussian then for every𝜀 >0there exists a 𝑘 >0such that for all𝑡

P(𝑇₁ =𝑡) ≥ 𝑘 𝑡¹⁺^𝜀

Thus𝑇₁has a very thick tail, decaying far slower than the exponential decay of the baseline case. In particular,𝑇₁ has infinite expectation, and so, since𝑇_𝐿 > 𝑇₁, the expectation of the time to learn𝑇_𝐿 is also infinite.

In contrast, we show that when private signals have thick tails—that is, when the probability of a strong signal vanishes slowly enough—then the time to learn has finite expectation. In particular, we show this when the left tail of𝐺₋ and the right tail of𝐺₊ are polynomial.8

Theorem 9. Assume that 𝐺₋(−𝑥) =𝑐·𝑥⁻^𝑘 and that 𝐺₊(𝑥) =1−𝑐·𝑥⁻^𝑘 for some 𝑐 > 0and𝑘 > 0, and for all𝑥greater than some𝑥₀. ThenE(𝑇_𝐿) < ∞.

8Recall that𝐺₋is the conditional cumulative distribution function of the private log-likelihood ratios𝐿_𝑡.

An example of private signal distributions𝐹₊and𝐹₋for which𝐺₋and𝐺₊have this form is given by the probability density functions

𝑓₋(𝑥) =













𝑐·e^−𝑥𝑥^−𝑘−1 when 1 ≤ 𝑥

0 when −1< 𝑥 < 1 𝑐· (−𝑥)⁻^𝑘⁻¹ when𝑥 ≤ −1.

and 𝑓₊(𝑥) = 𝑓₋(−𝑥), for an appropriate choice of normalizing constant 𝑐 > 0. In this case𝐺₋(−𝑥) =1−𝐺₊(𝑥) = ^𝑐

𝑘𝑥⁻^𝑘 for all𝑥 >1.9

The proof of Theorem 9 is rather technically involved, and we provide here a rough sketch of the ideas behind it.

We say that there is an upsetat time 𝑡 if 𝑎_𝑡−₁ ≠ 𝑎_𝑡. We denote byΞthe random variable which assigns to each outcome the total number of upsets

Ξ =|{𝑡 : 𝑎_𝑡₋₁ ≠𝑎_𝑡}|.

We say that there is arunof length𝑚 from time𝑡 if𝑎_𝑡 =𝑎_𝑡₊₁ =· · · =𝑎_𝑡₊_𝑚₋₁. As we will condition on𝜃 =+1 in our analysis, we say that a run from time𝑡 isgoodif 𝑎_𝑡 =1 andbadotherwise. A trivial but important observation is that the number of maximal finite runs is equal to the number of upsets, and so, ifΞ = 𝑛, and if𝑇_𝐿 =𝑡, then there is at least one run of length𝑡/𝑛before time𝑡. Qualitatively, this implies that if the number of upsets is small, and if the time to learn is large, then there is at least one long run before the time to learn.

We show that it is indeed unlikely that Ξ is large: the distribution of Ξ has an exponential tail. Incidentally, this holds foranyprivate signal distribution:

Proposition 8. For every private signal distribution there exist𝑐 >0and0 < 𝛾 <1 such that for all𝑛 > 0

P(Ξ≥ 𝑛) ≤𝑐𝛾^𝑛.

Intuitively, this holds because whenever an agent takes the correct action, there is a non-vanishing probability that all subsequent agents will also do so, and no more upsets will occur.

9Theorem 9 can be proved for other thick-tailed private signal distributions: for example, one could take different values of𝑐and𝑘for𝐺₋and𝐺₊, or one could replace their thick polynomial tails by even thicker logarithmic tails. For the sake of readability we choose to focus on this case.

Thus, it is very unlikely that the number of upsetsΞis large. As we observe above, whenΞis small then the time to learn𝑇_𝐿 can only be large if at least one of the runs is long. When𝐺₋has a thin tail then this is possible; indeed, Theorem 8 shows that the first finite run has infinite expected length when private signals are Gaussian.

However, when𝐺₋ has a thick, polynomial left tail of order𝑥⁻^𝑘, we show that it is very unlikely for any run to be long: the probability that there is a run of length𝑛 decays at least as fast as exp(−𝑛^𝑘/(𝑘+¹⁾), and in particular runs have finite expected length. Intuitively, when strong signals are rare then runs tend to be long, as agents are likely to emulate their predecessor. Conversely, when strong signals are more likely then agents are more likely to break a run, and so runs tend to be shorter.

Putting together these insights, we conclude that it is unlikely that there are many runs, and, in the polynomial signal case, it is unlikely that runs are long. Thus𝑇_𝐿 has finite expectation.

Probability of taking the wrong action

Yet another natural metric of the speed of learning is the probability of mistake 𝑝_𝑡 =P(𝑎_𝑡 ≠ 𝜃).

Calculating the asymptotic behavior of 𝑝_𝑡seems harder to tackle.

For the Gaussian case, while we cannot estimate 𝑝_𝑡 precisely, Theorem 8 immedi- ately implies a lower bound: 𝑝_𝑡is at least𝑘/𝑡¹⁺^𝜀, for every𝜀 >0 and𝑘 that depends on𝜀. This is much larger than the exponentially vanishing probability of mistake in the revealed signal baseline case.

More generally, we can use Theorem 4 to show that 𝑝_𝑡 vanishes sub-exponentially for any signal distribution, in the sense that

𝑡→∞lim 1 𝑡

log𝑝_𝑡 =0.

To see this, note that the probability of mistake at time 𝑡 −1, conditioned on the observed actions, is exactly equal to

min{𝜇_𝑡,1−𝜇_𝑡}; where we recall that

𝜇_𝑡 =P(𝜃 =+1|𝑎₁, . . . , 𝑎_𝑡₋₁) = e^ℓ^𝑡 e^ℓ^𝑡 +1

is the public belief. This is due to the fact that if the outside observer, who holds belief 𝜇_𝑡, had to choose an action, she would choose 𝑎_𝑡−1, the action of the last player she observed, a player who has strictly more information than her. Thus

𝑝_𝑡 =E(min{𝜇_𝑡,1−𝜇_𝑡})=E 1

e^|ℓ^𝑡^|+1

and since, by Theorem 4,|ℓ_𝑡|is sub-linear, it follows that 𝑝_𝑡is sub-exponential.

Dalam dokumen Essays on social learning and social choice (Halaman 34-40)