ADJUSTING STANDARD ERRORS
6.1 A BRIEF OVERVIEW OF BAYESIAN METHODOLOGY
127
6
out additional information is typically required. The idea is that under certain con- ditions one may find the inverse probability of an event, usually with the additional information. The notion of additional information is key to Bayesian methodology.
There are six foremost characteristic features that distinguish Bayesian regression models from the traditional maximum likelihood models such as logistic regression. Realize though that these features are simplifications. The details are somewhat more complicated.
1. Regression models have slope, intercept, and sigma parameters:
Each parameter has an associated prior.
2. Parameters Are Randomly Distributed: The regression parameters to be estimated are themselves randomly distributed. In traditional, or frequentist-based, logistic regression the estimated parameters are fixed. All main effects parameter estimates are based on the same underlying PDF.
3. Parameters May have Different Distributions: In Bayesian logistic regression, each parameter is separate, and may be described using a different distribution.
4. Parameter Estimates As The Means of a Distribution: When esti- mating a Bayesian parameter an analyst develops a posterior dis- tribution from the likelihood and prior distributions. The mean (or median, mode) of a posterior distribution is regarded as the beta, parameter estimate, or Bayesian coefficient of the variable.
5. Credible Sets Used Instead of Confidence Intervals: Equal-tailed credible sets are usually defined as the outer 0.025 quantiles of the posterior distribution of a Bayesian parameter. The posterior inter- vals, or highest posterior density (HPD) region, are used when the posterior is highly skewed or is bi- or multi-model in shape. There is a 95% probability that the credible set or posterior mean contains the true posterior mean. Confidence intervals are based on a frequency interpretation of statistics as defined in Chapter 2, Section 2.3.4.
6. Additional or Prior Information: The distribution used as the basis of a parameter estimate (likelihood) can be mixed with additional information—information that we know about the variable or parameter that is independent of the data being used in the model.
This is called a prior distribution. Priors are PDFs that add informa- tion from outside the data into the model.
The basic formula that defines a Bayesian model is:
f y f y f
f y
f y f f y f ( | ) ( | ) ( )
( )
( | ) ( ) ( | ) ( )
θ θ θ θ θ
θ θ θ
= =
∫
d (6.1)where f(y|θ) is the likelihood function and f(θ) is the prior distribution. The denominator, f(y), is the probability of y over all y. Note that the likelihood and prior distributions are multiplied together. Usually the denominator, which is the normalization term, drops out of the calculations so that the posterior distribution or a model predictor is determined by the product of its likelihood and prior. Again, each predictor can be comprised of a different posterior.
If an analyst believes that there is no meaningful outside information that bears on the predictor, a uniform prior will usually be given. When this hap- pens the prior is not informative.
A prior having a normal distribution with a mean of 0 and very high vari- ance will also produce a noninformative or diffuse prior. If all predictors in the model are noninformative, the maximum likelihood results will be nearly identical to the Bayesian betas. In our first examples below we will use nonin- formative priors.
I should mention that priors are a way to provide a posterior distribution with more information than is available in the data itself, as reflected in the likelihood function. If a prior is weak it will not provide much additional infor- mation and the posterior will not be much different than it would be with a completely noninformative prior. In addition, what may serve as an influential informative prior in a model with few observations may well be weak when applied to data with a large number of observations.
It is important to remember that priors are not specific bits of information, but are rather distributions with parameters which are combined with likeli- hood distributions. A major difficulty most analysts have when employing a prior in a Bayesian model is to specify the correct parameters of the prior that describe the additional information being added to the model. Again, priors may be multiplied with the log-likelihood to form a posterior for each term in the regression.
There is much more that can be discussed about Bayesian modeling, in particular Bayesian logistic modeling. But this would take us beyond the scope we set for this book. I provide the reader with several suggested books on the subject at the end of the chapter.
To see how Bayesian logistic regression works and is to be understood is best accomplished through the use of examples. I will show an example using R’s MCMCpack package (located on CRAN) followed by the modeling of the same data using JAGS. JAGS is regarded by many in the area as one of the most powerful, if not the most powerful, Bayesian modeling package. It was developed from WinBUGS and OpenBUGS and uses much of the same nota- tion. However, it has more built-in functions and more capabilities than do the BUGS packages. BUGS is an acronym for “Bayesian inference Using Gibbs Sampling” and is designed and marketed by the Medical Research Group out of
Cambridge University in the United Kingdom. More will be mentioned about the BUGS packages and JAGS at the start of Chapter 6, Section 6.2.2. Stata 14 was released on April 7, 2015, well after this text was written. Stata now has full Bayesian capabilities. I was able to include Stata code at the end of this chapter for Bayesian logistic models with noninformative and Cauchy priors.