• Tidak ada hasil yang ditemukan

Random Utility Theory

3.3 Some Random Utility Models

3.3.6 The Probit Model

3.3 Some Random Utility Models 121 it follows that

ε1,...,εlimm→+∞F (ε1, . . . , εm)= lim

ε1,...,εm→+∞exp

−G(e−ε1, . . . , e−εm)

=exp

−G(0, . . . ,0)

=exp[−0] =1

The third property is easily verified, becauseF (·)is defined by (3.3.51), a continu- ous function.

Furthermore, it can be demonstrated that the solution of (3.3.55), withF defined as in (3.3.51), actually gives expression (3.3.52) for the choice probabilities defining a GEV model.

Indeed, substituting (3.3.51) in expression (3.3.55), and from the homogeneity of G(·)andGj(·), it follows that

p[j] = +∞

εj=−∞

exp

−G(eV1−Vj−εj, . . . , eVm−Vj−εj)

·Gj(eV1−Vj−εj, . . . , eVm−Vj−εj)·e−εjj

= +∞

εj=−∞

exp

−[e−(Vjj)]μ·G(eV1, . . . , eVm)

· [e−(Vjj)]μ−1

·Gj(eV1, . . . , eVm)·e−εjj

= +∞

εj=−∞

exp−[e−(Vjj)]μG(eV1,...,eVm)

· [e−(Vjj)]μ−1

·Gj(eV1, . . . , eVm)·e−εjj

=eVj·Gi(eV1, . . . , eVm) μ·G(eV1, . . . , eVm) ·

exp−[e−(Vjj)]μG(eV1,...,eVm)

+∞

−∞

=eVj·Gj(eV1, . . . , eVm) μ·G(eV1, . . . , eVm) which is (3.3.52).

Multinomial logit, single-level hierarchical logit, multilevel hierarchical logit, and cross-nested logit models can be obtained as special cases of the GEV model by appropriately specifying the functionG(·), as shown in Appendix3.A.

mean and fully general variances and covariances:

E[εj] =0 Var[εj] =σj2 Cov[εj, εh] =σj h

(3.3.57)

Further characteristics of the multivariate normal r.v. are given in Appendix3.B.

Variances and covariances of the random residual vectorεare elements of them×m dispersion matrixΣ, wheremis the number of alternatives. The multivariate normal probability density of the residual vectorεis given by

f (ε)=

(2π )mdet(Σ)−1/2

exp[−1/2εTΣ−1ε] (3.3.58) Perceived utilitiesUj are also jointly distributed according to a multivariate nor- mal distribution with mean vectorV and variances and covariances equal to those of the residualsεj;U∼MVN(V,Σ).

The choice probability of alternativej,p[j], can be formally expressed in terms of the joint probability that utilityUj will assume a value within an infinitesimal interval and that the utilities of the other alternatives will have lower values. This probability element must then be integrated over all possible values ofUjto obtain p[j](see (3.3.54)):

p[j] =

U1<Uj

. . . +∞

Uj=−∞

. . .

Um<Uj

exp[−1/2(U−V)TΣ−1(U−V)]

[(2π )mdet(Σ)]1/2 dU1. . . dUm (3.3.59) The probit model is invariant (see Sect.3.2) if the matrixΣdoes not depend on the vector of systematic utilitiesV. In this case, the choice probability of a generic alternative depends only on systematic utility differences. Thus, Alternative Specific Attributes (ASA) and their coefficients (ASC) can be replaced by their differences with respect to the value of a reference alternative.

To illustrate the effect of variances and covariances on choice probabilities, con- sider the case of three alternatives (m=3), with systematic utilities equal to zero (VA=VB=VC=0)and the following variance–covariance matrix.

Σ=

1 σAB 0

σAB 1 0

0 0 σC2

Figure3.13charts the probabilityp[C]obtained with the probit model (3.3.59) for varying values of the parametersσAB andσC. As the variance ofUCincreases compared with those of the other alternatives, the choice probability ofC also in- creases. The value of the random residualεCeventually dominates the value ofVC,

3.3 Some Random Utility Models 123

Fig. 3.13 Influence of the variance and covariance of residuals on probit choice probabilities

and the perceived utilityUC is, with high probability, either much higher or much lower than the perceived utilitiesUAandUB(limσC→∞p[C] =0.5). Moreover, as the covariance (in this case identical with the correlation coefficient) between the residuals of alternativesAandB increases, the choice probability of alternativeC also increases, becauseAandB are increasingly perceived as a single alternative.

The same effect was shown in Sects.3.3.2and3.3.3for the hierarchical logit model.

In general, the probit model yields choice probabilities similar to those obtained from logit and hierarchical logit models if the same variance–covariance matrix is assumed. Moreover, as mentioned above, it allows for greater flexibility in the spec- ification of the covariance matrix, whose elements can assume whatever value and can be “directly” specified, unlike the logit-type models whose covariance matrix is indirectly defined through the choice network and model parameters.

The flexibility of the variance–covariance matrix can in fact be a problem in the practical use of the probit model. A variance–covariance matrix can contain up to (m(m+1))/2 distinct values, as noted in Sect. 8.3.2, wherem is the number of choice alternatives. Whenmis large, specification and calibration of all the pos- sible values can be problematic. Different methods have been proposed to reduce the number of unknown variance–covariance matrix elements requiring estimation.

All of these methods assume some structure underlying the random residuals. The parameters of this structure determine the elements of the variance–covariance ma- trix but are fewer in number than the total number of possible unknowns of such a matrix.

A first method, known asFactor Analytic Probit, expresses the vector of ran- dom residuals as a linear function of a vectorζ of independent standard normal

variables:

εj=

n

k=1

fj kζk, (3.3.60a)

ε=F ζ (3.3.60b)

where

ε is the(m×1)vector of multivariate normal random variables (factors) with elementsεj:ε∼MVN(0,Σ)

F is the(m×n)matrix of factor “loadings” with elementsfj k, mapping the vectorζ of standard normal random variables to the vector εof random residuals

ζ is the(n×1)vector of identical and independent standard normal random variables with elementsζk:ζ∼MVN(0,I)

From (3.3.60a), the elements of the variance–covariance matrixΣof the random residualsεj can be expressed as a function of the elementsfj k of matrixF:

Var[εj] =E εj2

=E n

k=1

fj k2ζk2

=

n

k=1

fj k2 ·E ζk2

=

n

k=1

fj k2, (3.3.62)

Cov[εj, εh] =E[εjεh] =E n

k=1

fj kζk·

n

k=1

fhkζk

=

n

k=1

fj kfhk·E ζk2

=

n

k=1

fj kfhk (3.3.63)

or in vectorial form:

Σ=E[εεT] =E[F ζ ζTF] =FE[ζ ζT]FT=FIFT=FFT (3.3.64) Because typically n≪m, the number of unknown elements is reduced from m(m+1)/2 in the matrixΣtom·nin the matrixF. In the extreme case(m=n), the matrix F is low triangular and univocally determined through the Cholesky factorization of the matrixΣ. A relevant application of the factor analytic repre- sentation of the probit model is in path choice, as shown in Sect. 4.3.3.1. Another relevant application based on a particular specification of (3.3.60a) is known in the literature as therandom coefficient probit. It is based on the assumption that the ran- dom residualεj derives from the variability of utility function coefficientsβk over the population of decision makers. In particular, for each individuali, coefficientβki is assumed equal to an average valueβkplus a random residualηik:

βkikik k=1,2, . . . , K

3.3 Some Random Utility Models 125 whereK is the total number of coefficients used to define the systematic utilities of themalternatives. By assuming that theηikare independently distributed normal variables with zero mean and varianceσk2,

ηik∼N 0, σk2

∀i, k Cov

ηki, ηih

=0 ∀i, k, h it follows that

Uji=Vjiji =

k

βkiXkji =

k

βkXkjiikXikj

with:

Vji=

k

βkXikj; εji =

k

ηikXikj; εi ∼MVN(0ε) (3.3.61) whereXkj is the value of attributekin alternativej; it is equal to zero if attribute Xkdoes not appear in the systematic utility of alternativej.

From comparison between (3.3.60a) and (3.3.61) it follows that fj kikXj ki

and by substituting into (3.3.62) and (3.3.63) then:

Var εij

=

k

Xikjσk2

(3.3.65) Cov

εij, εhi

=

k

XikjXikhσk2 (3.3.66) that is, in vectorial form:

F=XΣ1/2η and from (3.3.64):

Σε=XΣηXT

Using this approach, the number of unknown elements of the variance–covariance matrix is reduced from a possible maximum of(m(m+1))/2 to theK of the ma- trixΣη.

The flexibility of the probit model is achieved at the cost of computational com- plexity. The probit model does not possess analytical expressions for its choice prob- abilities inasmuch as there is no known closed-form solution of the integral (3.3.59).

Numerical integration methods are computationally burdensome when there are more than about five alternatives. Calculation of probit choice probabilities with several alternatives is typically carried out by approximation methods. In the fol- lowing, three traditional approximate methods are described: the so-called Monte

Carlo or Acceptance–Reject (AR) method, the GHK method, and the Clark approx- imation. However, it should be said that the last is computationally inefficient and is rarely used in practice.

The Monte Carlo method generates a sample of perceived utilities for the alter- natives (these can be thought of as the utilities perceived for each alternative by a sample of decision-makers) and estimates the choice probability of each alternative j as the fraction of times thatj is the alternative with maximum perceived utility.

More specifically, at thekth iteration, the method generates:

– A vectorεk=(εk1, . . . , εkm)Tof random residuals drawn from a zero-mean multi- variate normal distribution with dispersion matrixΣ.

– A vectorUkof perceived utilities:Uk=V+εk.

– A vectorpkof deterministic alternative choice probabilities:pk=(0, . . . ,1, . . . , 0)where the value one is associated to the largest component ofUk(the alterna- tive with maximum perceived utility).

Consequently, afterniterations, the sample estimatep[jˆ ]of the probabilityp[j]is:

ˆ p[j] =1

n

n

k=1

p[j/εk] =nj

n (3.3.67)

whereεkdenotes thekth draw of vectorεfrom an MVN(0,Σ)distribution, andnj is the number of times that alternativejis the maximum perceived utility alternative in the sample. It can be shown that the estimator (3.3.67) is unbiased and efficient.

With the Monte Carlo method, each extraction can be considered as the execution of a generalized Bernoulli trial withmpossible outcomes, where outcomej corre- sponds to alternativejwith maximum perceived utility, and occurs with probability p[j]. The joint sample frequency of the results is thus multinomially distributed and the sample variance of the estimatep[jˆ ]is:

Var ˆ p[j]

=1 np[jˆ ]

1− ˆp[j]

(3.3.68) For large enough values ofn, a confidence interval for p[j]can be obtained by assuming thatp[j] is approximately distributed as a normal r.v. with meanp[jˆ ] given by (3.3.67) and variance given by (3.3.68).

In applications, drawing a randomm-vectorεfrom an MVN(0,Σ)distribution can be accomplished indirectly by drawingmindependent values from a standard normalN (0,1)distribution by means of (3.3.60b). In practice, at the generic itera- tionkthe vectorεk of pseudorandom draws from a normal multivariate distribution MVN(0,Σ)can be obtained through:

– Drawing a vectorzkofmnormal standard independent variables.

– Calculatingεk =F zk whereF is known within a factor analytic approach or through a Cholesky factorization of the matrixΣ.

The Monte Carlo method, albeit simple to interpret and apply, exhibits some the- oretical drawbacks that can be overcome by using different procedures for calculat- ing probit probabilities. As described in Chap. 8, methods for random utility model

3.3 Some Random Utility Models 127 estimation are based on specific theoretical properties of the functionp[β], that is, on how choice probabilities change with respect to model parameters. Namely,p[β]

is required to be doubly differentiable and strictly positive. Becausep[β]does not exhibit a closed form for the probit model, these properties depend on how choice probabilities are simulated. Notably, when applying the Monte Carlo method,p[β]

is a step function (i.e., not continuous) and, in the presence of alternatives with low systematic utilities, it is not guaranteed to be strictly positive.

A possible solution is represented by the smoothed Monte Carlo method, accord- ing to which the choice probability vectorpk at the generic iterationkis given by aθ-parameter multinomial logit probability vectorpk=(pk1, . . . , pmk)rather than a deterministic vector. This leads to a continuous, doubly differentiable (3.3.67) function, given as the average of strictly positive logit probabilities rather than 0/1 values. Obviously, probit choice probabilities provided by a smoothed Monte Carlo represent an approximation of actual probit probabilities, proportional to the value of the variance parameterθ. In other words,θ should be chosen so as to provide a satisfactory compromise between speed and stability of convergence, increasing withθ, and reliability in simulated choice probabilities, decreasing withθ. Those concepts are extended in Sect.3.3.7when describing the mixed logit model.

Another possible solution to the operative problems of the Monte Carlo method lies in the GHK method, considered in the literature one of the most stable and accurate. Unlike the Monte Carlo method which supplies contemporaneously an estimate for the choice probabilities of all the alternatives, the GHK method deter- mines the probability of choosing a single alternative on each occasion. This makes it naturally burdensome if the number of alternatives is very high. So as to illustrate the mechanism, let us consider initially the case of a choice set consisting of three alternatives, and let us suppose we wish to determine the probability of choosing al- ternative 1. Allowing for (3.2.2a) and the theoretical properties of invariant random utility models, the perceived utility of the other two alternatives may be expressed in differential terms with respect to the utility of the considered alternative:

U2−U1=(V2−V1)+(ε2−ε1)→U21=V2121 U3−U1=(V3−V1)+(ε3−ε1)→U31=V3131

The covariance matrixΣ1 of random residualsε21 andε31 may be derived di- rectly from matrixΣ of residuals ε1. . . ε3. Then, because we are dealing with a symmetric and positive definite matrix, it may be expressed by Choleski factoriza- tion asΣ1=CCT, given that:

C=

c11 0 c21 c22

Recalling what was stated above concerning the Monte Carlo method, ifz1and z2are two standardized normal r.v. then we may write:

ε21=c11·z1 → U21=V21+c11·z1

ε31=c21·z1+c22·z2 → U31=V31+c21·z1+c22·z2

and the probability of choosing the first alternative may be reformulated as follows.

p[1] =Pr

(U21<0)∩(U31<0)

=Pr

(V21+c11z1<0)∩(V31+c21z1+c22z2<0)

=Pr[V21+c11z1<0] ·Pr

(V31+c21z1+c22z2<0)/(V21+c11z1<0)

=Pr

z1<−V21

c11

·Pr

z2<−V31+c21z1

c22

z1<−V21

c11

IfF stands for the distribution law of normal cumulative probability, the proba- bility product previously written becomes

p[1] =F

−V21 c11

·

−V21/c11

−∞

F

−V31+c21z1 c22

f (z1) dz1 (3.3.69) The first factor of (3.3.69) may be directly obtained from probability tables of standard normal random variables, and the integral may be calculated numerically by performing at the generic iterationkthe following steps.

– A drawzk1is generated of the standard normal random variable z1 truncated at

−V21/c11(to generate az1truncated at−V21/c11it is enough to generate a stan- dard normalzand calculatez1=F−1(zF (−V21/c11)).

– From the standard normal probability tables we calculate the value ik=F

−V31+c21zk1 c22

It may be demonstrated that a correct and efficient estimate of the integral of (3.3.69) is obtained by calculating the average of values ik on a certain number of iterations. The product of the two factors thus calculated, inserted into (3.3.69), supplies a correct and efficient estimate of the choice probabilityp[1]sought.

Generalization of the procedure to the case ofmalternatives is immediate. In this regard, suffice it to think that for a generic alternativej (withj >3)we obtain:

p[j] =Pr[Uij<0∀i=j] =Pr

z1<−V1j c11

·Pr

z2<−V2j+c21z1 c22

z1<−V1j

c11

·Pr

z3<−V3j+c31z1+c32z2 c33

z2<−V2j+c21z1 c22

z1<−V2j c11

. . . The Clark approximation, another traditional method for calculating probit choice probabilities, is based on an approximation for the maximum of a set of

3.3 Some Random Utility Models 129 normal random variables (the maximum is of course itself a random variable). The procedure is first illustrated by referring to a choice among three alternatives. In this case, perceived utilitiesU1, U2, andU3 are distributed according to a multi- variate normal distribution with mean vectorV =(V1, V2, V3)Tand the following variance–covariance matrix.

Σ=

σ12 σ12 σ13

σ21 σ22 σ23 σ31 σ32 σ32

Suppose the choice probability of alternative 3,p[3], is to be computed. Clark’s results express the meanV12 and the varianceS122 of the random variable U12 = max(U1, U2)as

V12=V2+(V1−V2)F (α)+γf (α) (3.3.70) S122 =var[U12] =m12−V122

wherem12is the second moment around zero of the variableU12, and is given by m12=V2222+

V1212−V22−σ22

F (α)+(V1+V2)γf (α) (3.3.71) The constantsγ andαin expressions (3.3.70) and (3.3.71) are, respectively, the standard deviation of the random variable(U1−U2):

γ=

σ1222− −2σ121/2

and the mean standardized value of the random variable(U1−U2):

α=(V1−V2)/γ

The symbols f (α) and F (α) denote, respectively, the value of the probabil- ity density function and probability distribution function of a standard normal r.v.

N (0,1)evaluated atα:

f (α)=(2π )−1/2exp(−α2/2) F (α)=

α

−∞

f (x) dx

Clark’s formulas also give the covariance between variablesUj andU12...j−1as:

Sj.12...i=cov(Uj, U12...i)=σij+(Sj.12...i−1−σij)F (α) wherei=j−1. Thus the covariance between variablesU3andU12is:

S3.12=cov(U3, U12)=σ23+(σ13−σ23)F (α) The probability of choosing alternative 3 is:

p[3] =Pr[U3≥U12] =Pr[U12−U3≤0] (3.3.72)

AlthoughU12 is not in fact normally distributed, Clark’s method assumes that it can be satisfactorily approximated by a normal r.v. having meanV3, varianceS12

and covarianceS3.12 with the normal r.v.U3. Thus, the choice probability (3.3.72) can be evaluated using standard results on the distribution of the difference of two normal variables (Appendix3.B.2):

p[3] =F

V3−V1232+S122 −2S3.12)1/2

(3.3.73) Choice probabilities for more than three alternatives can be calculated by se- quentially applying the procedure described above. The probability of choosing the generic alternativej can be obtained by computing sequentially the mean, variance, and covariance of nested pairs of perceived utilities ordered in such a way thatj is the last alternative. For example, the mean and variance ofU12=max(U1, U2)as well as its covariance withU3are computed first. Subsequently the mean and vari- ance of the variableU123=max(U3, U12)are computed together with its covariance withU4, and so on until the comparison is made between:

U12...j−1=max

Uj−1,max

Uj−2. . .max(U1, U2)

and Uj. At this point, the probability p[j] is obtained by applying expression (3.3.73). The entire sequence has to be repeated to calculate the probability of each alternative.