Nonparametric Estimation of Calibration Curves

5.3 Empirical Tests of the Favorite-Longshot Bias

5.3.3 Nonparametric Estimation of Calibration Curves

The previous two approaches have two potential drawbacks. First, they do not provide the precise shape of calibration curves. Second, they occasionally include several horses from the same race in one group, even though only one can win, violating independence. Those drawbacks motivate our third approach, which is

originally proposed in the recent study by Page and Clemen (2013). The large- sample nature of our data makes it possible to estimate a calibration curve φ nonparametrically. Nonparametric regression provides a useful diagnostic tool for detecting FLB.

Let the data be{(Xi,yi)}ⁿ_i=1 from an unknown joint density f. The regression function foryi on Xi is

m(x0) =_E_f[yi|^Xi =x0].

We want to estimate this nonparametrically, with minimal assumptions about the structure ofm. The idea for local linear estimator is to fit the local model

y_i = β0+β1(X_i−^x) +ε_i

through the observations in the same neighborhood. The reason for using the regressorX_i−^xrather thanX_i is so that the intercept equalsm(x) = _E[y_i|^Xi = x]_. Once we get the estimates bβ0(x) _and βb1(x), we then set mb(x) = βb0(x)_{. We can} use βb1(x) to estimate ∂m(x)/∂x.

Fan (1993) extends the idea of local linear regression to construct a smooth version of a local polynomial: findingα and βto minimize

∑

n i=1

x0−^Xi

{^yi−^β0−^β1(x₀−^Xi)}²^,

whereKis a kernel function andhn is a bandwidth. Let βb0 and bβ1be the solution to the weighted least squares problem given above. Simple calculation yields

βb0 = ^∑

ni=1Wiyi

∑ⁿi=1W_i with Wi defined by

Wi =K

x0−^Xi

(sn,2−(x0−^Xi)sn,1), where

s_n,l =

∑

n i=1

x₀−^Xi

(x₀−^Xi)^l, l =0, 1, 2, . . .

This idea is an extension of Stone (1977) and is similar in spirit to locally weighted scatterplot smoothers (LOWESS; Cleveland, 1979), but is simpler to implement since it does not require the identification of nearest neighbors.

Page and Clemen (2013) take this idea to estimate local regression line for each implied probability p by solving

min

β0,β₁

∑

n i=1

K_h(p−^pi){^yi−^β0−^β1(p_i−^p)}^, ^(5.4) wherepirepresents thei-th ofnobservations used in the estimation,his the width of an estimation window around pand K_h is an Epanechnikov kernel defined by

K_h(p−^pi) = ³ 4

1−

p−^pi

1{|^p−^pi| ≤ ^h}^. ^(5.5)

The estimator of conditional expectationE[ω|^p] is then given by bβ0.

This approach provides precise estimates for very high and low probabilities (Fan, 1992, 1993; Fan and Gijbels, 1992). ⁹ This is particularly important in our context since we are interested primarily in whether a calibration curve systemat- ically deviates from perfect calibration at probabilities close to two boundaries, 0 and 1.

An innovation in Page and Clemen’s (2013) approach is the use of clustered bootstrap to account for the non-independence of implied probabilities.¹⁰ Using groups of non-independent markets as clusters in a bootstrap resampling, they are able to estimate a confidence interval for the entire calibration curve.

Figure 5.3 shows the nonparametrically estimated calibration curves φ^s_t, using implied probabilities at t ∈ {^{600, 60}} seconds before races start. The gray area is the 95% confidence band calculated by the clustered bootstrap procedure with 1,000 replications.

Implied probabilities are well calibrated on the range [0, 0.5]and in particular close to the boundary (see Figure D.2 for the shape of calibration curves around q ∈ [0, 0.1]). The estimated calibration curves φ^s_t deviate from perfect calibration outside this range, but the confidence bands are also wider. This is natural since, before a race starts, even the most favored horse usually has an implied probability less than 0.5. The results indicate that FLB is very limited before races start:

we observe slight tendency to underestimate objective probabilities larger than 0.9 but the confidence band is relatively wide in this region due to the smaller sample size (it is rare to have pre-race implied probabilities higher than 0.9).

Next, we turn to nonparametric estimation of calibration curves φ_t^f, using im-

9Härdle (1990) documents boundary problem in kernel estimation.

10See Efron and Tibshirani (1993), Härdle and Bowman (1988), Härdle and Marron (1991).

0.2.4.6.81Implied probability

0 .2 .4 .6 .8 1

Actual probability

Calibration curve 95% CI

600 seconds before start

0.2.4.6.81Implied probability

0 .2 .4 .6 .8 1

Actual probability

Calibration curve 95% CI

60 seconds before start

Figure 5.3: Nonparametric estimation of calibration curves φ_t^s, t ∈ {^{600, 60}} ^seconds before races start.

0.2.4.6.81Implied probability

0 .2 .4 .6 .8 1

Actual probability

Calibration curve 95% CI

40 seconds before finish

0.2.4.6.81Implied probability

0 .2 .4 .6 .8 1

Actual probability

Calibration curve 95% CI

20 seconds before finish

0.2.4.6.81Implied probability

0 .2 .4 .6 .8 1

Actual probability

Calibration curve 95% CI

10 seconds before finish

0.2.4.6.81Implied probability

0 .2 .4 .6 .8 1

Actual probability

Calibration curve 95% CI

5 seconds before finish

Figure5.4: Nonparametric estimation of calibration curvesφ_t^f,t ∈ {40, 20, 10, 5}^seconds before races finish.

plied probabilities at t ∈ {40, 20, 10, 5} seconds before races finish. Figure 5.4 displays estimated φ_t^f with 95% confidence bands. First, we observe limited FLB at 40 seconds before races finish: φ₄₀^f coincides with perfect calibration (i.e., 95%

confidence band covers the 45-degree line) for objective probability q ∈ [_{0, 0.25}]_.

However,φ_t^f exhibits bigger deviations from perfect calibration as races approach to finish line (φ₂₀^f (q)>qforq ∈ [_{0, 0.11}]_and_φ₂₀^f (q) <qforq ∈ [_{0.42, 1}]_; _φ₁₀^f (q) >q for q ∈ [_{0, 0.15}] _and _φ₁₀^f (q) < q for q ∈ [_{0.35, 1}]_; _φ₅^f(q) > q for q ∈ [_{0, 0.21}] _and φ₅^f(q) < q for q ∈ [_{0.44, 1}]). Noticeably, the inverse-S shape of calibration curves φ_t^f exhibits significant FLB at the implied probabilities extremely close to 0 and 1 (see Figure D.3 for the shape of calibration curves aroundq ∈ [_{0, 0.1}]_).¹¹

Our finding that the degree of FLB is magnified as the remaining time horizon of the event shortens is the contrary to Page and Clemen’s (2013) finding that the FLB is weaker as the time to contract expiration shortens. Note, however, that the types of events studied in Page and Clemen (2013) and the current study are different. Page and Clemen (2013) use data from Intrade markets on future events, especially on political and sports events, which usually have a much longer time span than horse racing markets examined here.

Dalam dokumen Essays in Revealed Preference Theory and Behavioral Economics (Halaman 121-125)