www.elsevier.com / locate / econbase
MCMC algorithms for two recent Bayesian limited information
estimators
*
Chuanming Gao, Kajal Lahiri
Department of Economics, State University of New York at Albany, Albany, NY 12222, USA Received 22 March 1999; accepted 19 July 1999
Abstract
Recent developments in Bayesian limited information analysis of simultaneous equations models, e.g., Chao and Phillips (1998) [Chao, J.C., Phillips, P.C.B., 1998. Posterior distributions in limited information analysis of the simultaneous equations model using the Jeffreys prior. Journal of Econometrics 87, 49–86] and Kleibergen and van Dijk (1998) [Kleibergen, F., van Dijk, H.K., 1998. Bayesian simultaneous equation analysis using reduced rank structures. Econometric Theory 14, 731–743], provide new choices for empirical practitioners. This note proposes a ‘‘Gibbs within M-H’’ algorithm to explore the non-standard posterior densities resulting from these Bayesian approaches and illustrates the procedure with a simple labor supply model from Goldberger (1998) [Goldberger, A.S., 1998. Introductory Econometrics. Harvard University Press, Cambridge MA].  2000 Elsevier Science S.A. All rights reserved.
Keywords: Jeffreys prior; Endogeneity; Gibbs sampler; Metropolis-Hastings algorithm
JEL classification: C11; C30
1. Introduction
Recently, Chao and Phillips (1998, hereafter CP), and Kleibergen and van Dijk (1998, hereafter KVD) have developed two elegant Bayesian approaches to analyze simultaneous equations models. CP uses an explicit Jeffreys prior which places zero weight in the region of parameter space when the problem of local non-identification occurs. KVD (1998) treats an overidentified simultaneous equations model (SEM) as a linear model with nonlinear (reduced rank) parameter restrictions, and shows that they are uniquely related using a singular value decomposition. A diffuse or a natural conjugate prior for the parameters of the embedding linear model results in the posterior for the
*Corresponding author. Tel.: 11-518-442-4758; fax:11-518-442-4735.
E-mail address: [email protected] (K. Lahiri)
parameters of the SEM having zero weight in the region of the parameter space where local non-identification occurs.
While CP was mainly interested in deriving an exact or approximate representations for the posterior density of the structural coefficients, KVD suggested a Markov Chain Monte Carlo (MCMC) algorithm to evaluate their derived posterior density. The purpose of this note is to propose a more efficient simulator for the posterior densities of the two approaches and illustrate it using an empirical example from Goldberger (1998).
2. Review of the Bayesian approaches
Consider the following limited information formulation of the m-equation simultaneous equations model (LISEM):
y15Y2b1Z1g1u, (1)
Y25Z1P11Z2P21V ,2 (2)
where y : (T1 31) and Y : (T2 3(m21)) are the m included endogenous variables; Z : (T1 3k ) is an1
observation matrix of exogenous variables included in the structural equation (1); Z : (T2 3k ) is an2
observation matrix of exogenous variables excluded from (1); and u and V are, respectively, a T2 31 vector and a T3(m21) matrix of random disturbances to the system. We assume that (u, V )2 |N(0,
S^I ), where the mT 3m covariance matrix S is positive definite symmetric and is partitioned
9
s11 s21
conformably with the rows of (u, V ) as2 S5
S
s SD
Denote k5k11k .2 21 22The likelihood function for the model (1) and (2) is
1
The likelihood function which corresponds to this alternative representation can be written as 1
2Tm / 2 2T / 2
H
] 21J
Note that in the absence of restrictions onS, (1) is fully identified if and only if rank(P2)5(m21)# product of the prior (6) and the likelihood function (3).
The structural model (1) and (2) has unrestricted reduced form
( y1 Y )2 5Z (1 p1 P1)1Z2F1(j1 V ),2 (7)
whereF: k23m. The restricted reduced-form model (4) is resulted if rank (F)5(m21). KVD uses a diffuse (Jeffreys) prior for the parameters of the linear multivariate model (7),
2(k1m11 ) / 2
p(p1,P1,F,V)~uVu , (8)
which implies the following prior for the parameters of (4),
2(m11 ) / 2 21 1 / 2
9
p(b,p1,P1,P2,V)~uVu uV ^Z9Zu 3u(B9^Ik e1^P2 B'^P2')u, (9)
2
where B5(b Im21), and e15(1, 0, 0, . . . ,0)9 is the first m dimensional unity vector. Note that the prior in (9) also places zero weight where rank(P2),(m21). The joint posterior of the parameters of (4) is readily constructed as proportional to the product of the prior (9) and the likelihood function (5).
3. A Gibbs within M–H algorithm
Since the posterior densities and their conditionals from the above two approaches do not belong to any standard class of probability density, Gibbs sampling technique can not be readily performed. We suggest a simulation algorithm which combines the ideas of Gibbs sampling and Metropolis–Hastings (M–H) algorithm to explore the posteriors.
Suppose we use a candidate-generating density r(x) to generate drawings from the target density
p(x). An Independence sampler, which is a special case of the M–H sampler, may be written in
i i21
It is generally not feasible to draw all elements of the parameter vector x simultaneously. A block-at-a-time possibility was first proposed by Hastings (1970, Sec. 2.4). Chib and Greenberg (1995) applied the M-H algorithm in turn to subblocks of the vector x (‘‘M-H within Gibbs’’), such that the target density p(x) is manipulated to generate full conditional densities for each of the subblocks of x. However, in our situation, the full conditionals are not readily available from the target density. Note that if the candidate-generating density r(x) can be successfully simulated in step 2, the above Independence sampler will work. Therefore we propose to use Gibbs sampling in step 2 to generate drawings from r(x).
Consider a simple case where the vector x contains two blocks, x5(x , x ) and the full conditionals1 2
r(x1ux ) and r(x2 2ux ) are available, which will be used in a Gibbs sampler to make independent1
drawings from the invariant density r(x). The combined algorithm, which we call ‘‘Gibbs within M-H’’, is as follows:
Note that possible serial correlation may exist between parameter values of successive drawings in Gibbs sampling. This may be overcome by iterating step 2 onto itself for a certain number of times
i
before x is taken as a drawing from the joint marginal density r(x), and is passed on for comparison with the previously accepted drawing from the posterior p(x).
4. The empirical example
We implemented our proposed algorithm to estimate a labor supply equation of the following simultaneous equations model from Goldberger (1998):
Supply: Y15a1Y21a21a3X11U1 (11)
Demand: Y25a4Y11a51a6X21a7X31a8X41U2 (12)
are Z5[1, X , X , X , X ], namely, constant term, family size, education, age and ethnicity,1 2 3 4
respectively. The data set consists of observations on 100 male family heads, interviewed in 1963–1964 [see Mirer (1995)]. It includes: Y15annual number of months worked, Y25monthly wage rates, X15family size, X25education (years), X35age (years), and X45race (coded 0 if white, 1 if black). Since (12) is just identified, we will focus on the over-identified supply equation (11).
Note that in CP, the likelihood function (3) is used as the candidate generating density; whereas in KVD, a posterior density of the embedding linear model (7) with Jeffreys / diffuse prior serves as the candidate generating density. To implement the ‘‘Gibbs within M-H’’ algorithms, we use the classical LIML estimates and the estimated covariance matrix of the disturbances as starting values. Raftery and Lewis’s (1992) convergence diagnostics ( gibbsit) were applied to the outputs from the Gibbs and the M–H steps for both the approaches. The number of burn-in iterations, the total number of required iterations, and the values of thinning parameters are estimated and used for a final run to ensure that the sampling drawings are valid representations of the interested densities. For the current example, the acceptance rate for the M–H algorithm was around 60% for CP approach, but only 6% for KVD, the latter indicating slow convergence.
Before we discuss the results, note that the degree of overidentification for Eq. (11) is 2, the
2
adjusted R of a regression of Y on Z is 0.29, and the correlation coefficient between the disturbance2
in (11) and the residuals from the reduced form regression of Y on Z is2 20.25. Fig. 1 shows the posterior marginal densities of a1 resulting from the two approaches. We report point estimates or posterior modes together with standard errors for the parameter a1 in Table 1. For the sake of comparison, we also report OLS, 2SLS, and classical LIML estimates.
From Fig. 1, we note that the locus of the KVD posterior fora1 is located to the right of the CP posterior. The KVD approach yields a higher modal value (1.00) than CP (0.84). We also found that the KVD median (1.05) was greater than the CP median (0.80). In addition, the KVD posterior exhibits slightly more dispersion. As noted by Goldberger (1998), the bias in OLS in this example is dramatic as 2SLS gives a much higher estimate (0.767 as compared to 0.271). The classical LIML estimate (0.814) is slightly higher than 2SLS. Sinces12 is negative, the downward bias of OLS and 2SLS, as compared to LIML, is expected. It is known that LIML is median unbiased. The observed
Table 1
Estimates for the labor supply equation
Estimator a1 s.e.
OLS 0.271 0.260
2SLS 0.767 0.472
LIML 0.814 0.482
a b
CP-PMOD 0.840 0.505
a b
KVD-PMOD 1.000 0.555
a
In the Gibbs sampling step, burn-in is 100 iterations for both CP and KVD, and thinning parameter is 3 for CP and 2 for KVD. In M–H algorithm, burn-in is 100 for CP and 200 for KVD, and thinning parameter is 1 for both CP and KVD. The total number of drawings is 18 400 for CP and 60 500 for KVD. These are about four times the number of iterations as suggested by Raftery and Lewis’s (1992) convergence estimator at routine level of precision. The number of drawings in the final output is 6000 for CP and 30 000 for KVD, which are used to generate Fig. 1.
b
These values are the standard deviations computed from the final outputs.
posterior loci for CP and KVD as well as their median and modal values suggest that the KVD
1
posterior is somewhat distorted.
References
Chao, J.C., Phillips, P.C.B., 1998. Posterior distributions in limited information analysis of the simultaneous equations model using the Jeffreys prior. Journal of Econometrics 87, 49–86.
Chib, S., Greenberg, E., 1995. Understanding the Metropolis-Hastings algorithm. The American Statistician 49, 327–335. Goldberger, A.S., 1998. Introductory Econometrics, Harvard University Press, Cambridge, Mass.
Hastings, W.K., 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrica 57, 97–109. Kleibergen, F., van Dijk, H.K., 1998. Bayesian simultaneous equation analysis using reduced rank structures. Econometric
Theory 14, 731–743.
Mirer, T.W., 1995. Economic Statistics and Econometrics, 3d ed, Prentice Hall, Englewood Cliffs, N.J.
Raftery, A.E., Lewis, S.M., 1992. How many iterations in the Gibbs sampler. In: Bernardo, J.M., Smith, A.F.M., Dawid, A.P., Berger, J.O. (Eds.), Bayesian Statistics, vol. 4, Oxford University Press, pp. 763–773.
1