A practical technique to estimate multinomial probit models
in transportation
Denis Bolduc *
DeÂpartment d'eÂconomique, Universite Laval, Sainte-Foy, QueÂbec, Canada G1K 7P4
Received 14 September 1994; received in revised form 1 June 1998
Abstract
The Multinomial Probit (MNP) formulation provides a very general framework to allow for inter-dependent alternatives in discrete choice analysis. Up until recently, its use was rather limited, mainly because of the computational diculties associated with the evaluation of the choice probabilities which are multidimensional normal integrals. In recent years, the econometric estimation of Multinomial Probit models has greatly been focused on. Alternative simulation based approaches have been suggested and compared. Most approaches exploit a conventional estimation technique where easy to compute simula-tors replace the choice probabilities. For situations such as in transportation demand modelling where samples and choice sets are large, the existing literature clearly suggests the use of a maximum simulated likelihood (MSL) framework combined with a Geweke±Hajivassiliou±Keane (GHK) choice probability simulator. The present paper gives the computational details regarding the implementation of this practical estimation approach where the scores are computed analytically. This represents a contribution of the paper, because usually, numerical derivatives are used. The approach is tested on a 9-mode transportation choice model estimated with disaggregate data from Santiago, Chile.
ReÂsumeÂ
La formulation probit polytomique (MNP) permet d'analyser et de deÂcrire de fac,on treÁs ¯exible, le choix d'un individu parmi un ensemble de modaliteÂs inter-deÂpendantes. Les nombreux progreÁs eectueÂs au cours des dernieÁres anneÂes concernant l'estimation eÂconomeÂtrique des modeÁles MNP, permet maintenant de contourner la probleÂmatique lieÂe aÁ l'eÂvaluation d'inteÂgrales normales multiples qui de®nissent les prob-abiliteÂs de seÂlection des modaliteÂs. Les diverses approches consideÂreÂes exploitent geÂneÂralement des simula-teurs ecaces agissant comme substituts aux probabiliteÂs exactes de choix. Le simulateur ayant la faveur geÂneÂrale est le GHK, suggeÂre de fac,on indeÂpendante par Geweke, Hajivassiliou et Keane. Pour les situa-tions telles que geÂneÂralement rencontreÂes dans le domaine des transports ouÁ les eÂchantillons ainsi que les ensembles de choix sont de grande taille, la litteÂrature suggeÁre treÁs clairement l'emploi d'une approche du
0191-2615/98/$Ðsee front matter#1998 Elsevier Science Ltd. All rights reserved. P I I : S 0 1 9 1 - 2 6 1 5 ( 9 8 ) 0 0 0 2 8 - 9
PART B
Transportation Research Part B 33 (1999) 63±79
maximum de vraisemblance utilisant le simulateur GHK pour approcher les probabiliteÂs de choix. Le preÂsent article fournit les deÂtails relatifs aÁ l'utilisation de cette meÂthodologie dans un cadre du maximum de vraisemblance avec deÂriveÂes analytiques. L'approche est ensuite testeÂe sur un ensemble de donneÂes deÂcrivant le choix entre neuf modes servant aÁ relier le centre-ville de Santiago aÁ des reÂgions en peÂripheÂrie.
#1998 Elsevier Science Ltd. All rights reserved.
Keywords:Multinomial probit; Simulation based estimate; Discrete choice; Transportation demand modeling
1. Introduction
Since the introduction of discrete choice techniques to analyze transportation related problems, hundreds of studies have focused on the behavioral related aspects associated with the decision process of individuals making a choice among a ®nite set of alternatives. The operational model mostly used has the Multinomial Logit (MNL) form. To have choice probabilities with a closed-form that can be calculated easily is its major advantage over more general strategies. However, the assumption made by this model that the alternatives are mutually independent is often limitative.
An attractive solution to this problem is to use the MultiNomial Probit (MNP) framework. The inter-dependencies are then accounted for through the correlation structure of an error term assumed to be normally distributed. Any error correlation structure can be postulated. Up until recently, its use was rather limited, mainly because of the computational diculties associated with the evaluation of the choice probabilities which are multidimensional normal integrals. Recently, alternative simulation based approaches have been suggested. They are described and compared in Hajivassiliou et al. (1996). Most approaches exploit a conventional estimation technique where easy to compute simulators replace the choice probabilities. For situations such as in transportation demand modelling where samples and choice sets are large, the maximum simulated likelihood framework combined with a Geweke±Hajivassiliou±Keane (GHK) choice probability simulator approach should be favoured. This paper gives the computational details regarding the implementation of this particular estimation strategy where to speed computation, the score vector associated with the likelihood function is computed using analytic expressions. The approach is tested on a 9-mode transportation choice model using disaggregate data from Santiago, Chile.
2. The multinomial probit formulation
A typical transportation mode choice model concerns the choice by individualn;n1;. . .;N
of the alternativei in the setCnf1;. . .;Jng which produces the highestVin utility level, i.e. so
that Vin4Vjn, 8j2Cn. In this notation, the choice set is allowed to dier across individuals, to
account for their own speci®c travel mode availabilities. In estimation, it is very important to take this choice set variability into account. To present the proposed estimation approach
intellig-ibility, it is easier to ®rst assume that each individual faces a same choice set C and then, in a
2.1. MNP with universal choice set
Assuming that each individual n faces the same J alternatives, an MNP model formulation
based on linear-in-parameters utilities may be written as follows:
VinZinin;
with
yin
1 if Vin4Vin for j1;. . .;J; and
0 otherwise
The variable yin designates the choice made by individual n, Vin is the unobservable utility of
alternative i as perceived by individual n, Zin is a 1K vector of explanatory variables
char-acterizing both the alternativeiand the individualn.is a K1vector of ®xed parameters and
®nallyinis a normally distributed random error term of mean zero assumed to be correlated with
the errors associated with the other alternativesj;j1;. . .;J;j6i. In vector form, one can write
this relationship as:
VnZnn; nN 0;; 1
whereVnandn are J1vectors andZnis a JKmatrix.
As is well known, the only identi®able parameters in the original model (1) are those that can be retrieved uniquely from the parameters of a scaled model dierenced with respect to the utility of an arbitrary alternative. Below, we use the last alternative as the base and the scaling is per-formed by ®xing to one, the variance of the ®rst error term in the dierenced model. This version
which is referred here as theestimablemodel, can be written as:
UnXnn; nN 0;; 2
which is the model in Eq. (1) written in deviation with respect to the utility of the last alternative
VJn. More speci®cally, let mJÿ1, then Un is a m1 vector with components
UinVinÿVJn,i1;. . .;m. The matrix Xn and the vectorn are de®ned similarly. The scaling
is such that var
n var 1nÿJn 1.1 To impose a positive de®nite error covariance matrix
, it is preferable to work with a formulation based on a Cholesky decomposition of. Such
an equivalent model is written as:
UnXnSwn;wnN 0;Im; 3
1 To set var
1n 1 is equivalent to dividing each row of the dierenced model by the quantity model by the
quantitys var 1nÿJn p
. In that case, one would getUin VinÿVJn=sand the vectorshould be replaced with
where S is a lower triangular matrix such that SS0, with s111 to be consistent with the
scaling used. The parameters one is interested in are theKparametersk in the vector and the
pm m1=2ÿ1 parameters s21;s31;. . .;sm1;s22;. . .;smm that we incorporate in a p1
vector denoted ass. To complete the notation, we call 0;s00, the joint vector of parameters
formed by the vertical concatenation ofand s.
2.1.1. The utilities in deviation with respect to the chosen alternative
To evaluate the log-likelihood function requires calculating for each individual in the sample the probability associated with the choice made. To compute the GHK choice probability
simu-lator for a given individualn, it is convenient to map the utilities into dierences with respect to
the utility of the choice made so that the probability Pn ithen becomes an integral over a
non-positive orthant. This particular representation can be obtained by premultiplying the estimable
model in Eq. (2) by a mm linear operator that we call Mi. For any i other than J, the
fol-Premultiplying Eq. (2) by matrixMi gives:
UnMiUnMiXnMin;Xnn; 4
where nN 0;i, with i MiM0i. Note that the Un vector thus obtained contains only
negative components: U1n<0;U2n<0;. . .;Umn<0, As seen below, the GHK simulator
exploits this particular feature. The vector n is de®ned similarly to Un and the matrix Xn is of
dimension mKwith rows Z1nÿZin;. . .; ZJnÿZin. Finally, note that because of Eq. (3),
one may also write:
V n i MiM0iMiSS0M0i: 5
Eqs. (4) and (5) are particularly useful because they are expressed in terms of the estimable
parameters in and the vectors which, one can recall, is composed of the elements of the
cho-lesky matrix Sin Eq. (3).
2.2. MNP with individual speci®c choice set
To account for individual speci®c choice sets is relatively straightforward. Let n denote an
individual n1;. . .;Nwhich chooses that alternativeiin choice setCn f1;. . .;Jngfor which
Vin4Vjn, 8j2Cn. The corresponding MNP model is obtained by excluding from Eq. (1) those
estimablemodel, is amounts to removing the appropriate rows from Eq. (2). Such a formulation
can be obtained with the use of an mnmmapping matrixEn, wheremnJnÿ1. It is de®ned
as an identity matrix with rows associated with the alternatives not available, deleted. In the case
of an universal choice set,Enwould simply be an identity matrix of sizem. With this in mind, one
can therefore use a unique and general notation to refer to both cases of the MNP model for-mulation.
2.3. A general notation for the MNP model
Consider the previously introduced mmmapping matrix Mi that maps themJÿ1
uti-lity dierencesU
nof theestimablemodel in Eq. (2) into a deviation with respect to the alternative
i chosen by individual n. Consider also the mnm mapping matrixEn which removes the
uti-lities associated with the alternatives unavailable to n. Those two operations can be combined
into the following operator:
Min EnMi: 6
This mnmmapping matrix, when premultiplying the utility vectorUn gives:
Un
This is the mn1vector of the utilities associated with the alternatives available to individualn
expressed in deviation with respect to the chosen alternative. Note again that in the case of a
universal choice set, MinMi, 8n sinceEn would be an identity matrix. Using the so calledMin
matrix and theestimablemodel in Eq. (2), the ®nal notation that we exploit to compute the choice
probabilities is:
UnMinUnMinXnMinnXnn; 7
wherenN 0;in, withinMinSS0M0in. This notation should make clear that depending on
the mode availability and the choice made, the error covariance matrix vary between
observa-tions. However, it should be noted that a commonS Cholesky matrix appears in the covariance
structure. The matrix Xn represents a mnK matrix of the deviation of the explanatory
vari-ables of each available alternative j other than alternativeiwith respect to the explanatory
vari-ables of the chosen alternative i. Again, note that by de®nition, the Un is a vector with negative
components, i.e.: U1n40;. . .;Umnn40. To compute a given choice probability, we will also need
to consider a Cholesky transformed version of this formulation. This is written as:
UnXnSnwn;wnN 0;Imn; 8
SnS0nin MinSS0M0in: 9
For estimation, we will exploit this relationship between the individual speci®c Cholesky matrix
Sn and the unique Cholesky matrix S containing the parameters to estimate. To clarify the
dis-tinction between those two matrices, note thatSnrefers to the formulation expressed in deviation
with respect to the chosen alternative with only the available alternatives being considered whileS
refers to theestimablemodel expressed in deviation with respect to the last alternative with all the
alternatives included, whether it is available or not. Obviously, a given observationncontributes
to estimate only those elements ofSreferring to the available alternatives.
3. Model estimation
3.1. The choice probabilities
DenotePn ias the choice probability associated with the alternative i;i2Cn chosen by
indi-vidualn. Given the formulations in Eqs. (7) or (8), this is also the probability of drawingUnwith
each component U1n;. . .;Umn;n being non positive. It can be computed as an mn- dimensional
integral of the form:
Pn i
where n n;in is a multivariate normal density with mean zero and covariance matrix in.
Unless mn is small, the choice probability in Eq. (10) cannot be computed using a numerical
integration technique. One solution is to simulate Pn i. Many simulators with good properties
have been suggested. The most useful ones are described and compared in Hajivassiliou et al. (1996). The GHK simulator, that is used here, was clearly found to be the one with the best the-oretical and empirical properties. Thethe-oretical and analytical details on this simulator may be found in BoÈrsch-Supan and Hajivassiliou (1993), Hajivassoliou (1993) and Geweke et al. (1992). Still, we provide computational details regarding its implementation because we need speci®c expressions to derive analytical relationships to compute the scores. The GHK simulator exploits the recursive structure imposed by the Cholesky transformation present in Eq. (8). For a given
observation n, using Eq. (8), one can write:
U1n40!w1n4
To write it in a more compact way, we use the notation:
w1n 4 a1n
Therefore, the choice probabilityPn ican be written as:
Pn i pr Un40 pr w1n4a1n;w2n4a2n w1n;. . .;wmn;n4amn;n w1n;. . .;wmnÿ1;n: By conditioning, one can also write:
Pn i pr U1n40pr U2n40jU1n40. . .pr Umn;n40jU1n40;U2n40;. . .;Umnÿ1;n40 pr w1n4a1npr w2n4a2n w1njw1n4a1n. . .:
3.2. The GHK simulator
Letrdenote a draw. Callwnra given realizationrof the vectorwnsuch that Eq. (11) is satis®ed.
Based onRsuch draws, the following expression:
where
fnr i pr w1n;r4a1n;rpr w2n;r4a2n;r w1n;rjw1n;r4a1n;r. . .;
is a choice probability conditional onwnrwhich can be used to provide an unbiased simulator for
Pn i. The GHK simulator de®ned in Eq. (13) is smooth with respect to the parameters k,
k1;. . .;Kand thes11;n;. . .;sm
n;mn;n elements forming the lower part of Cholesky matrixSn. It
is also known to have good asymptotic properties. For proofs, refer to Hajivassiliou and McFadden (1990). For notational compactness and because of the normality assumption, we can
writefnr ias follows:
fnr i a1n;r a2n;r. . . amn;n;r 1nr2nr. . .mn;nr; 14
wheredenotes a standard normal cumulative distribution function. Using Eq. (12), thealn;r are
computed as:
random uniform number taken from the (0,1) interval.
3.3. The simulated likelihood function
The estimation method considered is based on the maximisation of the natural logarithm of the
simulated likelihood function. Denoting as the joint vector of parameters to estimate (i.e. it
containsands), the simulated log-likelihood function which is written as:
LX
is maximized with respect to. Technical details on the computation of the analytical ®rst- order
derivatives are provided in Appendix A. To our knowledge, this is the ®rst implementation of the simulated likelihood framework based on the GHK probability simulator which uses
analy-tical instead of numerical derivatives.2 The computation of@L=@is quite straightforward when
2 One of the referees claimed that numerical derivatives should be more reliable and as fast as suggested approach
compared to the computation of@L=@s. The latter is computed using a chain rule which exploits a
jacobian transformation between s and sn. This transformation arises from the relationships in
Eq. (9). Appendix A provides the details on the computation of@L=@sn. The derivative@L=@s of
the log-likelihood function with respect to the vector s of interest is evaluated as @s0
n=@s@L=@sn,
where @s0n=@s is a jacobian matrix whose calculation is detailed in Appendix B. In the
imple-mentation of the estimation algorithm, we use a BHHH approach which avoids the need to compute the second-order derivatives which are in this case, rather involved. The computer pro-gram written in Fortran 77 has gone through several stages of testing. Dierent Monte Carlo based tests were made on a SUN workstations. We now use it on some real data describing a choice situation among nine inter-related travel mode alternatives.
4. An application
To test the methodology, we use a data bank about the choice of modes for the morning peak journey to work to the central business district (CBD) of Santiago. This data bank has been described and employed several times in the past. Gaudry et al. (1989) focused on the estimation of the valuation of time saving by the transportation mode users. They found that the values obtained with linear Logit or linear nested Logit speci®cations were particularly high. By using
nonlinear speci®cations of Box±Cox Logit type, they were able obtain more satisfactory results.3
In our application, we use the same data and the same model speci®cation in order to ®nd out whether to allow for a ¯exible correlation structure of the utilities can lead to reasonable VOT values. Details on the sample size, the dierent mode shares and their availability are displayed in Table 1.
We will ®nd that to do so certainly helps to improve the value of time (VOT) estimates. Still, their seems to remain some room for improvement. To implement a Box±Cox technique within a MNP setting represents a too formidable task. We suspect that an MNP extension of the MNL formulation with lognormally distributed VOT implemented in Ben-Akiva et al. (1993) would be a reasonable approach to address this problem. Our exploratory analyses using the random VOT based MNL speci®cation clearly point into this direction. Such a more general MNP model fra-mework with randomly distributed VOT coecients will be considered in a later research.
The important variables entering the model speci®cation are travel timetinandcin=wn, the cost
for individualnof travelling by modeias a proportion of his/her net personal income per min of
work [Chilean pesos/min ($)]. To use cost as a proportion of income allows the VOT to vary deterministically with income. The other VOT related variables used in the speci®cation are walking and in vehicle time for all modes and a waiting time variable for all modes other than car. Finally, in addition to eight alternative speci®c dummies, appear two other explanatory variables of socio-economic type. The ®rst one, speci®c to car driver alternatives 1 and 6 indicates the number of cars in the household as a proportion of number of driving permit holders. The last variable is a sex dummy included for modes 2, 3 and 7 listed in Table 1. The model estimates
3 In order to implement their Box-Cox methodology, they removed observations that contained zero values for some
obtained using a basic MNP i.i.d. speci®cation are displayed in column 1 of Table 2. An MNP speci®cation with i.i.d. errors can easily be estimated using Gauss±Hermite quadrature to com-pute the choice probabilities entering the log-likelihood function. Only 12 quadrature points were used to perform the integrals. The solution was obtained in 14 iterations using 0 as the starting value for all the parameters. The implied VOT estimates as a percentage of net income are pro-duced in the same column of Table 3. Those values are in the same range as the ones obtained using a linear logit speci®cation of Gaudry et al. (1989). The value of time sensitivity in vehicle, as a percentage of net personal income is computed as:
@ cin=wn
@tin
@Vin
@tin
= @Vin @ cin=wn
tm
cin=wn
:
As well known, in linear speci®cation, the VOT measures are evaluated as ratios of parameters. The two other VOT estimates were produced similarly. Standard errors were computed using the delta method.
4.1. The model formulation with correlated utilities
To capture the similarities between the transportation modes and to keep a rather parsimo-nious parametric speci®cation of the error covariance structure, we use the ®rst-order Generalized Autoregressive [GAR(1)] process approach suggested in Bolduc (1992). In order to obtain such a formulation, the original model in Eq. (1) is replaced with:
VnZnTPÿ1n; n N 0;IJ; 17
whereTis aJ-diagonal matrix which contains standard deviationi in theith position, andPis a
matrix for capturing the covariance eects using functions based on few underlying unknown
Table 1
Statistics on the sample used
Alternative Chosen Percent Availability
1. Car-driver 168 0.12933 681
2. Car-passenger 66 0.05081 730
3. Shared-taxi 58 0.04465 833
4. Metro 295 0.22710 407
5. Bus 430 0.33102 1287
6. Car-driver±metro 101 0.07775 530
7. Car-passenger±metro 41 0.03156 594
8. Shared-taxi±metro 65 0.05004 828
9. Bus-metro 75 0.05774 841
parameters. It can be viewed as a restricted version of a saturated factor analytic formulation.
This model can be obtained from an initial model VnZnTn, with nWnn, being
assumed. This last autoregressive process is assumed to be based on a JJ Boolean (0±1)
contiguity matrix W which, in this particular application, relates the ®rst two alternatives
toge-ther and does the same with the last seven alternatives. The Wmatrix used is as follows:
Table 2
Estimation resultsa
Variables MNP i.i.d. SML MNP R50 3. Shared-taxi 0.61 (4.74) 0.82 (5.25) 0.61 (2.31) 0.51 (1.66) 0.27 (0.89) 4. Metro 3.11 (16.56) 3.10 (15.97) 2.81 (5.78) 2.73 (5.84) 2.88 (5.88) 5. Bus 1.49 (12.30) 1.65 (11.11) 1.47 (8.27) 1.45 (8.38) 1.49 (7.96) 6. Car-driver±metro ÿ0.02 (ÿ0.01) 0.18 (0.80) ÿ0.59 (ÿ1.15) ÿ0.49 (ÿ1.01) ÿ0.59 (ÿ1.20) 7. Car-passenger±metro 0.36 (2.68) 0.58 (3.60) ÿ1.46 (ÿ0.96) ÿ1.94 (ÿ1.07) ÿ2.33 (ÿ1.11) 8. Shared-taxi±metro 0.66 (3.65) 0.90 (4.54) ÿ0.09 (ÿ0.16) ÿ0.14 (ÿ0.22) ÿ0.20 (ÿ0.62) 9. Bus±metro 0.90 (5.02) 1.12 (5.78) 1.23 (6.60) 1.20 (6.74) 1.25 (7.72) Other variables
Cost/income ($) ÿ0.02 (ÿ8.61) ÿ0.02 (ÿ7.59) ÿ0.02 (ÿ3.89) ÿ0.02 (ÿ3.81) ÿ0.02 (ÿ3.83) Walk time (min) ÿ0.08 (ÿ9.37) ÿ0.08 (ÿ9.24) ÿ0.07 (ÿ4.04) ÿ0.06 (ÿ3.98) ÿ0.07 (ÿ4.01) In vehicle time (min) ÿ0.05 (ÿ5.50) ÿ0.05 (ÿ5.62) ÿ0.04 (ÿ3.51) ÿ0.04 (ÿ3.48) ÿ0.05 (3.55) Waiting time (min) ÿ0.25 (ÿ5.28) ÿ0.18 (ÿ6.15) ÿ0.13 (ÿ3.56) ÿ0.13 (ÿ3.54) ÿ0.13 (ÿ3.64) No. cars/no. permit holders
(alternatives 1 and 6)
1.29 (5.61) 1.24 (5.44) 1.54 (3.16) 1.45 (3.13) 1.52 (3.14)
Sex dummies (male=1) (alternatives 2,3 and 7)
ÿ0.42 (ÿ3.64) ÿ0.39 (ÿ3.69) ÿ0.41 (ÿ3.21) ÿ0.43 (ÿ3.27) ÿ0.49 (ÿ3.53)
ÿ1482.39 ÿ1473.60 ÿ1447.40 ÿ1442.83 ÿ1443.86
Number of iterations 14 7 20 22 18
Run time (min and s)/iteration on SUN UltraSparc 1
0.06 1.44 1.50 7.26 6.43
W
0 1 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
0 0 0 1 1 1 1 1 1
0 0 1 0 1 1 1 1 1
0 0 1 1 0 1 1 1 1
0 0 1 1 1 0 1 1 1
0 0 1 1 1 1 0 1 1
0 0 1 1 1 1 1 0 1
0 0 1 1 1 1 1 1 0
To insure the invertibility of the PIJÿW matrix, so that n could be replaced with Pÿ1n
for a value ofde®ned on the (ÿ1,1) interval, theWmatrix is normalized so that each row sum
to one. In the postulated structure, the correlation coecientand a maximum ofJÿ1 standard
error terms can be estimated. [For more details on the use of GAR(1) processes to approx-imate the error covariance structure in discrete choice modelling and on parameter estimability
issues, refer to Bolduc, (1992)]. In our application, the second standard deviation term 2 is set
to 1.
4.2. Estimation results
Estimation results of the dierent MNP versions considered are displayed in Table 2. Column 2 results refers to the SML MNP solution based on 50 draws, assuming cross-correlation between the alternative speci®c errors and homoscedasticity. In other words, all sigmas are ®xed to 1 and
onlyis estimated. The results obtained clearly show that correlation is present between the
uti-lities. The GAR(1) setting makes it possible to summarize the full correlation structure using a single correlation coecient. The ®t is better but the estimated parameters are not so dierent from the MNP i.i.d. solution. Columns 3 and 4 of Table 2 refer to the model speci®cation where seven standard deviation terms are estimated. Recall that for identi®cation, the second standard deviation term is ®xed to 1. Therefore, in this speci®cation referred to as unconstrained in the tables each utility has its own heteroscedasticity eect. According to the estimation results obtained, heteroscedasticity is signi®cantly present. The model ®t is much better than with the homoscedastic structure. Values of time estimates, especially when 250 draws are used, are much
Table 3
Value of time as a percentage of net personal incomea
Variable MNP i.i.d. SML MNP R50 homoscedastic
SML MNP R50 unconstrained
SML MNP R250 unconstrained
SML MNP R250 constrained
In vehicle time 201.7 (4.72) 211.3 (4.64) 184.0 (4.43) 198.6 (4.42) 214.5 (4.50) Walking time 345.7 (6.46) 353.9 (6.05) 288.2 (5.16) 297.7 (5.04) 315.8 (5.01) Waiting time 773.5 (4.74) 854.5 (4.83) 558.8 (4.03) 590.1 (3.99) 606.3 (4.02)
lower than in the MNP i.i.d. case. One known problem associated with maximum simulated likelihood is the bias introduced by simulating. This is because the GHK is a technique to simu-late the choice probability, not its natural log. The bias can be proved to be present using Jensen's inequality. It is also known that it becomes unsigni®cant with large number of simulation draws. This justi®es our use of 250 draws for estimation. Note that with 50 draws, the results are very close to those obtained with large number of draws. The improvement in the ®t using larger number of draws can be attributed to the presence of the bias. All this indicates that the GHK simulator performs very well.
The last column of Table 2 refers to a constrained version of the model where the hetero-scedastic structure is postulated to be consistent with the de®nition of the alternatives provided in Table 1. In this version, we are postulating homoscedasticity among groups of alternatives. The
groups formed using the restrictions: 16; 21; 3 4 8; 59 with 7 remaining
free are car-driver, car passenger, taxi-metro, bus and car-passenger±metro. This is done to demonstrate the ¯exibility of the approach. In this case, the simulated log-likelihood value is
ÿ1443.86 which is marginally higher in absolute value than the one corresponding to the
unrest-ricted version of the model, which indicates that the restrictions make sense. This is con®rmed with a log-likelihood ratio test. All parameter estimates closely resemble the values obtained with the unconstrained model. The run time per iteration was approximately 6.4 min.
The constrained estimation gains in terms of number of iterations to reach optimum but the computation time per iteration is almost the same, since only mappings are applied in computing the derivatives, so a comparable number of manipulations are required at a given iteration. The implied VOT estimates for in vehicle time, walking time and waiting time are not as good as those obtained with the more general error structure. The unconstrained version performs best in pro-ducing smaller VOT estimates. However, there appears to be some room for improvement. An MNP formulation with lognormally distributed VOT coecients is a potentially good alternative for solving the problem of high VOT estimates. This is left for a future research.
5. Conclusion
Our application demonstrated the feasibility of MNP estimation when applied to choice situa-tion based on large choice sets. In the most general case that we considered in the applicasitua-tion, each utility was characterized by a speci®c standard error and the correlation among two natural blocks of utilities was modelled using a generalized autoregressive error structure. Based on the observed improvements in the log-likelihood values, to take into account these inter-relationships between the model errors was important. The technique was applied to the 9-mode transportation choice model considered in Gaudry et al. (1989).
Acknowledgements
The author would like to thank Professor Marc Gaudry for kindly providing the Santiago data bank used for this study.
Appendix A. The ®rst-order derivatives
A.1. The simulated likelihood function
The estimation method considered here maximises the natural logarithm of the simulated
likelihood function. Noting that denotes the joint vector of parameters to estimate (i.e. it
con-tainsands), then the simulated log-likelihood function is written as:
LX
lnr is the conditional probability of choosing alternative i given a particular
drawrof the vectorwnin Eq. (8) of the text. Note that we use directly the conventions established
in Eqs. (11)±(15). We now provide the details regarding the computation of the ®rst-order deri-vatives.
A.2. First-order derivatives
Recall that the joint vector of parameters is denoted as. Using Eq. (A1), we get:
@L
computation of@aln;r=@, the following recursion:
needs to be taken into account. Recall thatuhn;r denotes a particular draw from the random
uni-form distribution de®ned over the unit interval. We now give the explicit relationships forands.
A.3. Derivatives with respect tob
@L
and where using Eq. (15) and the recursion in Eq. (A4), one has:
@aln;r
A.4. Derivatives with respect tos
As previously mentioned, this derivative is computed using a chain rule linkingsn tos. Below
we provide@L=@sn. Then,@L=@sis computed as@s0n=@s@L=@snwhere the jacobian matrix@s0n=@sis
detailed in Appendix B. Now recall that the sn vector is formed by concatenating the elements
sij;n,i5j in a column vector. Then the elements of the@L=@sn vector are:
where using Eq. (15), one can note that:
Appendix B. The jacobian matrix
In this appendix, we detail the relationship betweensn ands. From Eq. (9), one can write:
inSnS0
n MinSS0M0in: B1
LetLbe a matrix such thatvec S L0s and letK0 be the matrix that mapsstovec S0. Then
Eq. (B1) may be written as:
vec in MinSMinvec S MinSMinL0s
MinMinSvec S0 MinMinSK0s:
Then this implies that
@vec in
@s0 MinSMinL 0 M
inMinSK0B0in: B2
Similarly callingL0nandK0n the matrices mappingsntovec Snandvec S0n, respectively. Then:
vec in vec SnS0
n SnImmL
0
nSn ImnSnK
0 nsn;
and therefore:
@vec in
@s0 n
SnImnL
0
n ImnSnK
0 nA
0
n: B3
Taking a pseudo-inverse of the matrix in Eq. (B3), then:
@s0n
@vec inA
0
n ;
which ®nally implies that:
@s0 n
@s
@vec in0
@s
@s0 n
@vec inBinA
0
n ; B4
whereBinand A
0
n are detailed in Eqs. (B2) and (B3), respectively.
References
Bolduc, D., 1992. Generalized autoregressive errors in the multinomial probit model. Transportation Research BÐ Methodological 26B(2), 155±170.
BoÈrsch-Supan, A., Hajivassiliou, V., 1993. Smooth unbiased multivariate probability simulators for maximum like-lihood estimation of limited dependent variable models. Journal of Econometrics 58, 347±368.
Gaudry, M.J.I., Jara-Diaz, S.R., Ortuzar, J., 1989. Value of time sensitivity to model speci®cation. Transportation Research BÐMethodological 23B(2), 151±158.
Geweke, J., Keane, M., Runkle, D., 1992. Alternative computational approaches to inference in the multinomial probit model, Research Department, Federal Reserve Bank of Minneapolis.
Hajivassiliou, V.A., 1993. Simulation estimation methods for limited dependent variable models. In: (Eds.) Handbook of Statistics, Vol. 11. Maddala, G.S., Rao, C.R., Vinod, H.D., pp. 519±543, North Holland, Amsterdam.
Hajivassiliou, V.A., McFadden, D., 1990. The method of simulated scores for the estimation of LVD models with an application to external debt crises, working paper, Cowles Foundation for Research in Economics, Yale University, Connecticut.