4.6 Generalized Linear Models
4.6.2 Poisson Regression on the Bikeshare Data
To overcome the inadequacies of linear regression for analyzing theBikeshare data set, we will make use of an alternative approach, called Poisson
regression. Before we can talk about Poisson regression, we must first in- Poisson
regression
troduce the Poisson distribution.
Poisson distribution
Suppose that a random variableY takes on nonnegative integer values, i.e. Y ∈{0,1,2, . . .}. IfY follows the Poisson distribution, then
Pr(Y =k) =e−λλk
k! fork= 0,1,2, . . . . (4.35)
Here, λ>0 is the expected value of Y, i.e.E(Y). It turns out that λalso equals the variance of Y, i.e. λ= E(Y) = Var(Y). This means that if Y follows the Poisson distribution, then the larger the mean ofY, the larger its variance. (In (4.35), the notationk!, pronounced “k factorial”, is defined as k! =k×(k−1)×(k−2)×. . .×3×2×1.)
The Poisson distribution is typically used to modelcounts; this is a nat- ural choice for a number of reasons, including the fact that counts, like the Poisson distribution, take on nonnegative integer values. To see how we might use the Poisson distribution in practice, let Y denote the num- ber of users of the bike sharing program during a particular hour of the day, under a particular set of weather conditions, and during a particu- lar month of the year. We might model Y as a Poisson distribution with mean E(Y) = λ = 5. This means that the probability of no users dur- ing this particular hour is Pr(Y = 0) = e−50!50 = e−5 = 0.0067 (where 0! = 1 by convention). The probability that there is exactly one user is Pr(Y = 1) = e−1!551 = 5e−5 = 0.034, the probability of two users is Pr(Y = 2) = e−2!552 = 0.084, and so on.
Of course, in reality, we expect the mean number of users of the bike sharing program, λ= E(Y), to vary as a function of the hour of the day, the month of the year, the weather conditions, and so forth. So rather than modeling the number of bikers, Y, as a Poisson distribution with a fixed mean value likeλ= 5, we would like to allow the mean to vary as a function of the covariates. In particular, we consider the following model for the meanλ= E(Y), which we now write asλ(X1, . . . , Xp)to emphasize that it is a function of the covariates X1, . . . , Xp:
log(λ(X1, . . . , Xp)) =β0+β1X1+· · ·+βpXp (4.36) or equivalently
λ(X1, . . . , Xp) =eβ0+β1X1+···+βpXp. (4.37) Here, β0,β1, . . . ,βp are parameters to be estimated. Together, (4.35) and (4.36) define the Poisson regression model. Notice that in (4.36), we take the log of λ(X1, . . . , Xp) to be linear in X1, . . . , Xp, rather than having λ(X1, . . . , Xp)itself be linear inX1, . . . , Xp; this ensures thatλ(X1, . . . , Xp) takes on nonnegative values for all values of the covariates.
To estimate the coefficients β0,β1, . . . ,βp, we use the same maximum likelihood approach that we adopted for logistic regression in Section4.3.2.
Specifically, given nindependent observations from the Poisson regression model, the likelihood takes the form
%(β0,β1, . . . ,βp) = En i=1
e−λ(xi)λ(xi)yi
yi! , (4.38)
where λ(xi) = eβ0+β1xi1+···+βpxip, due to (4.37). We estimate the coef- ficients that maximize the likelihood %(β0,β1, . . . ,βp), i.e. that make the observed data as likely as possible.
We now fit a Poisson regression model to the Bikeshare data set. The results are shown in Table 4.11and Figure4.15. Qualitatively, the results are similar to those from linear regression in Section 4.6.1. We again see that bike usage is highest in the spring and fall and during rush hour,
Coefficient Std. error z-statistic p-value
Intercept 4.12 0.01 683.96 0.00
workingday 0.01 0.00 7.5 0.00
temp 0.79 0.01 68.43 0.00
weathersit[cloudy/misty] -0.08 0.00 -34.53 0.00 weathersit[light rain/snow] -0.58 0.00 -141.91 0.00 weathersit[heavy rain/snow] -0.93 0.17 -5.55 0.00 TABLE 4.11. Results for a Poisson regression model fit to predict bikers in theBikesharedata. The predictorsmnthandhrare omitted from this table due to space constraints, and can be seen in Figure 4.15. For the qualitative variable weathersit, the baseline corresponds to clear skies.
FIGURE 4.15. A Poisson regression model was fit to predict bikers in the Bikesharedata set.Left:The coefficients associated with the month of the year.
Bike usage is highest in the spring and fall, and lowest in the winter.Right:The coefficients associated with the hour of the day. Bike usage is highest during peak commute times, and lowest overnight.
and lowest during the winter and in the early morning hours. Moreover, bike usage increases as the temperature increases, and decreases as the weather worsens. Interestingly, the coefficient associated with workingday is statistically significant under the Poisson regression model, but not under the linear regression model.
Some important distinctions between the Poisson regression model and the linear regression model are as follows:
• Interpretation:To interpret the coefficients in the Poisson regression model, we must pay close attention to (4.37), which states that an increase inXj by one unit is associated with a change in E(Y) =λ by a factor ofexp(βj). For example, a change in weather from clear to cloudy skies is associated with a change in mean bike usage by a factor of exp(−0.08) = 0.923, i.e. on average, only 92.3% as many people will use bikes when it is cloudy relative to when it is clear.
If the weather worsens further and it begins to rain, then the mean bike usage will further change by a factor ofexp(−0.5) = 0.607, i.e.
on average only 60.7% as many people will use bikes when it is rainy relative to when it is cloudy.
●
●
●
●
● ●
●
●
● ●
●
●
−0.6−0.4−0.20.00.2
Month
Coefficient
J F M A M J J A S O N D
●
●
●
●
●
●
●
●
●
●
●●● ● ● ●
●
●●
●
●
●
●
●
5 10 15 20
−2−101
Hour
Coefficient
• Mean-variance relationship:As mentioned earlier, under the Poisson model, λ = E(Y) = Var(Y). Thus, by modeling bike usage with a Poisson regression, we implicitly assume that mean bike usage in a given hour equals the variance of bike usage during that hour. By contrast, under a linear regression model, the variance of bike usage always takes on a constant value. Recall from Figure4.14that in the Bikesharedata, when biking conditions are favorable, both the mean andthe variance in bike usage are much higher than when conditions are unfavorable. Thus, the Poisson regression model is able to handle the mean-variance relationship seen in the Bikesharedata in a way
that the linear regression model is not.5 overdispersion
• nonnegative fitted values:There are no negative predictions using the Poisson regression model. This is because the Poisson model itself only allows for nonnegative values; see (4.35). By contrast, when we fit a linear regression model to theBikesharedata set, almost 10% of the predictions were negative.