• Tidak ada hasil yang ditemukan

3.6. Distribution for Count data

3.6.6. Regression Model for Count Data

In classical terms, linear regression is also known as a least square regression model. This is a statistical method that allows us to summarize and study the relationship between two continuous variables.

One variable, denoted as X, is regarded as a predictor, explanatory, or independent variable, and the other, denoted as Y, is regarded as the response, outcome, or dependent variable. A linear regression model with a single predictor variable is known as a simple linear regression model (Kutner et al., 2004, Seber and Lee, 2012, Montgomery et al., 2015, Williams, 1959).

The simple linear regression model is written as:

𝑌 = 𝑋𝛽 + 𝜖 (3.23)

where

𝑌 = ( 𝑦1

⋮ 𝑦𝑛

) , 𝑋 = (

1 𝑥11⋯ 𝑥1𝑘

⋮ ⋮ ⋮

1 𝑥𝑛1 𝑥𝑛𝑘

) , 𝛽 = ( 𝛽1

⋮ 𝛽𝑘

) , 𝜖 = ( 𝜖1

⋮ 𝜖𝑘

)

Y is the response variable, 𝛽0 is the intercept, 𝛽 is the vector of the coefficient estimates of the random variables (an unknown pparameter that needs to be estimated) and 𝜀 is the error term (or residual) used to capture the deviation of the data from the model. The aim is to find the values for the parameters 𝛽𝑎 {𝑎 = 1, 2, … . . 𝑘} which would provide a best fit for the data. The regression must follow the following assumptions: linear relationship, independence of errors with each other and the covariates, multivariate normality, no or little multicollinearity, no autocorrelation and homoscedasticity (errors must have zero mean and constant variance) (Kutner et al., 2004, Seber and Lee, 2012, Montgomery et al., 2015).

Applying the zero-mean assumption of the errors in equation 3.23, the expectation of the random matrix is defined as:

𝐸[𝑌] = [𝐸{𝑌𝑖𝑗 }] (3.24)

Where 𝑖 = (1, … . . 𝑛; 𝑗 = 1, … … 𝑝). Least squares regression describes the behaviour of the location of the conditional distribution, using the mean of the distribution to represent its central tendency.

The residual 𝜀𝑖 are defined as the difference between the observed and the estimated values.

Minimizing the sum of the square residuals,

∑ 𝑟(𝑌 − 𝑋𝑇𝛽̂) = ∑(𝑌 − 𝑋𝑇𝛽̂)2

𝑛

𝑖=1 𝑛

𝑖=1

(3.25)

Where

𝑟(𝜇) = 𝜇

2 is the quadratic loss function, which gives the least squares estimator 𝛽̂ by

𝛽̂ = (𝑋𝑇𝑋)−1𝑋𝑇𝑌 𝑤ℎ𝑒𝑟𝑒 𝑋 ≠ 𝑋𝑇 (3.36)

Also, the additional assumption that the errors (residual) 𝜀𝑖 follow a Gaussian distribution

𝜀𝑖~ N(0, 𝜎2𝐼𝑛) (3.27)

Where 𝐼𝑛 is the n x n identity matrix, it provides a framework for testing the significance of the coefficient found in equation 3.26. Under this assumption, the least square estimator is also the maximum likelihood estimator. By taking the expectations, with respect to

𝜀

𝑖 in equations 3.25.and 3.26, as well as noting that the linear function of a normally distributed random variable is normally distributed itself, we can rewrite the model in 3.26 as:

𝑋 ~ 𝑁(𝜇, 𝜎2𝐼𝑛), 𝑤ℎ𝑒𝑟𝑒 𝜇 = 𝑋𝑇𝛽 (3.28)

Therefore, the model in 3.26 represents the relationship between the mean of 𝑦𝑖. for 𝑖 = 1,2, … 𝑛, and the covariates linearly.

3.6.6.2. Generalized Linear Model

The family of generalized linear models (GLMs) provides a collection of models extending basic concepts from linear regression to applications where error terms follow any wide range of distributions, including binomial and Poisson for modeling count data (Waller and Gotway, 2004).

Thus, equation 3.28 refers to data that are normally distributed, but can be generalized to any distribution of the exponential family (Nelder and Baker, 1972, McCullagh and Nelder, 1989). GLMs consist of three components:

1. a probability distribution that belongs to the exponential family of distributions (known as a random component which defines the distribution of error terms)

2. a linear predictor 𝜌𝑖= 𝛽0+ 𝛽1𝑥𝑖1+ ⋯ + 𝛽𝑘𝑥𝑖𝑘 = 𝑋𝑇𝛽̂ (also known as a systematic component defining the linear combination of explanatory variables) and

3. A link function which defines the relationship between the systematic and random components, given as: 𝐸[𝑌] = 𝜇𝑖 = 𝜗−1 (𝜌𝑖).

GLM parameters generally require an iterative procedure rather than closed-form solutions for linear models (McCullagh and Nelder, 1989, Waller and Gotway, 2004). A GLM can be used for data that are not normally distributed and for cases where the relationship between the mean of the response variable and the covariates is not linear. The GLM includes many important distributions such as Gaussian, Poisson, Gamma, and Inverse-Gamma (Cameron and Trivedi, 2013, Kutner et al., 2004, Seber and Lee, 2012, Montgomery et al., 2015, Waller and Gotway, 2004).

3.6.2.1. Poisson regression

This is a special case of GLM commonly used to model count data. Poisson regression has been used for modelling count data in many fields such as public health (Arslan et al., 2013, Duncan et al., 2002, Xiang and Song, 2016) , Epidemiology (Best et al., 2000, Frome and Checkoway, 1985, Zou, 2004, Gartner et al., 2016, Hanewinckel et al., 2010, Sobngwi et al., 2001), insurance (Boucher and Denuit, 2006, Christiansen and Morris, 1997, Ismail and Jemain, 2007) and many other research areas. The canonical link function is logarithm.

The model is specified as:

Pr(𝑌 = 𝑦) = 𝑒−𝜆 𝜆𝑦

𝑦! (3.29)

For 𝜆 > 0, the mean and variance of a poison distribution are shown as

𝐸(𝑌) = 𝑉𝑎𝑟(𝑌) = 𝜆 (3.30)

The likelihood function is given as

𝐿(𝛽|𝑦, 𝑥) = ∏ Pr(𝑦𝑖|𝑢𝑖) = ∏exp(−𝑢𝑖) 𝑢𝑖𝑦𝑖 𝑦!

𝑁

𝑖=1 𝑁

1

(3.31)

The assumptions are:

Response Y has a poison distribution, 𝑌 ~𝑃𝑜𝑖𝑠(𝜆), (𝐸(𝑌) = 𝜆, 𝑎𝑛𝑑 𝑉𝑎𝑟( 𝑌) = 𝜆

With the assumption that the mean is equal to the variance, any factor that affects one will affect the other. This poses a problem when the data exhibits a different behavior. Thus, the usual assumption of homoscedasticity would not be appropriate for Poisson data (Preston, 2005). Statistically, an important motivation for the Poisson distribution, however, lies in the relationship between the mean and the variance. Most of the proposed approaches to this problem focus on over-dispersion (Ismail and Jemain, 2007, Berk and MacDonald, 2008).

3.6.2.2. Negative binomial

One of the ways to handle the situation posed by a Poisson regression is to fit a parametric model that is more dispersed than Poisson. A natural choice is the negative binomial (NB), given as:

𝑃(𝑌 = 𝑦|𝜇𝑖, 𝑘) = Ґ (1 𝑘+ 𝑦𝑖) Ґ (1

𝑘) 𝑦𝑖!

( 𝑘𝜇𝑖 1 + 𝑘𝜇𝑖)

𝑦

( 1

1 + 𝑘𝜇𝑖)

1

𝑘, (3.42)

log{𝜇𝑖} = 𝑋𝑇𝛽 (3.33)

Where the parameters 𝜇𝑖 𝑎𝑛𝑑 𝑘 represent the mean and the dispersion of the negative binomial. The respective mean and variance of this model are:

𝐸[𝑌𝑖] = exp{𝑋𝑇𝛽} , 𝐸[𝑌𝑖] = exp{𝑋𝑇𝛽}, (3.34)

The variance of a negative binomial is a quadratic function of its mean. The negative binomial approaches the Poisson (𝜇𝑖 ) model for 𝑘 → 0.

The negative binomial PDF can be described as the probability of observing y failures before kth success in a series of Bernoulli trials. Under such description, r is a positive integer (Hilbe, 2011).

However, there is no compelling mathematical reason to limit this parameter to integers.

Negative binomial is a generalization of Poisson regression. It loosens the highly restricted assumption that the variance is equal to the mean. this is based on the mixture of the poison -gamma mixture distribution. This model is popular because it models the Poison heterogeneity with a gamma distribution.

Given the negative binomial PDF with parameter ((𝜇, 𝑘):

Or

𝑉[𝑌𝑖] = exp{𝑋𝑇𝛽} + 𝑘 𝑒𝑥𝑝{𝑋𝑇𝛽}2 (3.35)

𝑓(𝑦| 𝜇, 𝑘) = (𝑦𝑖+ 𝑘 − 1

𝑘 − 1 ) 𝜇𝑖𝑘 (1 − 𝜇𝑖)𝑦𝑖 (3.36)

𝑓 (𝑦| 𝜇, 𝑘) = (𝑦𝑖+ 𝑘 − 1)

𝑦𝑖! (𝑘 − 1)! 𝜇𝑖𝑘 (1 − 𝜇𝑖)𝑦𝑖 (3.37)

Converting the NB PDF into exponential family form results in

Where

exp{𝑦𝑖 𝑙𝑛(1 − 𝜇𝑖)} is the link, and 𝑘 ln(𝜇𝑖) + ln(𝑦𝑖𝑘−1+𝑘−1) is the cumulant.

Thus, the canonical link and cumulant can easily be extracted from a PDF when expressed in exponential family form. This gives:

Therefore, the first and second derivates, with respect to 𝜃, respectively yield the mean and variance functions, given as:

𝑉(𝜇) therefore equals 𝑟(1 − 𝑘) 𝑘⁄ 2. Assume we now parameterize 𝑘 and 𝜇 in terms of 𝜋 and 𝛾 (𝑦| 𝜇, 𝑘) = exp {𝑦𝑖 𝑙𝑛(1 − 𝜇𝑖) + 𝑘 ln(𝜇𝑖) + ln (𝑦𝑖 + 𝑘 − 1

𝑘 − 1 )} (3.38)

𝜃𝑖 = ln (1 − 𝜇𝑖) → 𝜇𝑖 = 1 − 𝑒𝑥𝑝(𝜃)𝑖 𝑏(𝜃𝑖) = −k ln 𝜇𝑖 → −𝑘 (1 − exp(𝜃𝑖)

= 𝛼𝑖∅(𝑠𝑐𝑎𝑙𝑒) = 1

(3.59)

E[𝑌] = 𝑏 (𝜃𝑖) = 𝛿𝑏

𝛿𝑘𝑖 𝛿𝑘𝑖

𝛿𝜃𝑖= − 𝑟

𝑘𝑖(− (1 − 𝑘𝑖)) = 𝑟(1−𝑘𝑖)

𝑘𝑖 = 𝜇𝑖

V[Y]= 𝑏′(𝜃𝑖) = 𝛿2𝑏

𝛿𝑘𝑖2(𝛿𝑘𝑖

𝛿𝜃𝑖)2+ 𝛿𝑏

𝛿𝑘𝑖 𝛿2𝑘𝑖

𝛿𝜃𝑖2 = 𝜇

𝑘𝑖2 (1 − 𝑘𝑖)2+−𝜇

𝑘𝑖 (−(1 + 𝑘𝑖))

=𝑟 (1 − 𝑘𝑖) 𝑘𝑖2

(3.40)

(1 − 𝑘𝑖⁄(𝑘𝑖) = 𝛾𝜋𝑖

Where 𝛾 = 1 𝑟⁄ given the defined values of 𝜇 𝑎𝑛𝑑 𝛾 . negative binomial PDF can then be re- parametrization such that

This is identical to equation 3.59, which we derived via Poisson-gamma mixture.