The Probit Model - Some Random Utility Models

Random Utility Theory

3.3 Some Random Utility Models

3.3.6 The Probit Model

3.3 Some Random Utility Models 121 it follows that

ε₁,...,εlimm→+∞F (ε₁, . . . , ε_m)= lim

ε₁,...,εm→+∞exp

−G(e^−ε¹, . . . , e^−ε^m)

=exp

−G(0, . . . ,0)

=exp[−0] =1

The third property is easily verified, becauseF (·)is defined by (3.3.51), a continuous function.

Furthermore, it can be demonstrated that the solution of (3.3.55), withF defined as in (3.3.51), actually gives expression (3.3.52) for the choice probabilities defining a GEV model.

Indeed, substituting (3.3.51) in expression (3.3.55), and from the homogeneity of G(·)andG_j(·), it follows that

p[j] = +∞

εj=−∞

exp

−G(e^V¹^−V^j^−ε^j, . . . , e^V^m^−V^j^−ε^j)

·G_j(e^V¹^−V^j^−ε^j, . . . , e^V^m^−V^j^−ε^j)·e^−ε^jdε_j

= +∞

εj=−∞

exp

−[e^−(V^j^+ε^j⁾]^μ·G(e^V¹, . . . , e^V^m)

· [e^−(V^j^+ε^j⁾]^μ−1

·G_j(e^V¹, . . . , e^V^m)·e^−ε^jdε_j

= +∞

ε_j=−∞

exp−[e^−(V^j^+ε^j⁾]^μG(e^V¹,...,e^Vm)

· [e^−(V^j^+ε^j⁾]^μ−1

·G_j(e^V¹, . . . , e^V^m)·e^−ε^jdε_j

=e^V^j·G_i(e^V¹, . . . , e^V^m) μ·G(e^V¹, . . . , e^V^m) ·

exp−[e^−(V^j^+ε^j⁾]^μG(e^V¹,...,e^Vm)

+∞

−∞

=e^V^j·G_j(e^V¹, . . . , e^V^m) μ·G(e^V¹, . . . , e^V^m) which is (3.3.52).

Multinomial logit, single-level hierarchical logit, multilevel hierarchical logit, and cross-nested logit models can be obtained as special cases of the GEV model by appropriately specifying the functionG(·), as shown in Appendix3.A.

mean and fully general variances and covariances:

E[ε_j] =0 Var[ε_j] =σ_j² Cov[εj, ε_h] =σ_{j h}

(3.3.57)

Further characteristics of the multivariate normal r.v. are given in Appendix3.B.

Variances and covariances of the random residual vectorεare elements of them×m dispersion matrixΣ, wheremis the number of alternatives. The multivariate normal probability density of the residual vectorεis given by

f (ε)=

(2π )^mdet(Σ)−1/2

exp[−1/2ε^TΣ⁻¹ε] (3.3.58) Perceived utilitiesU_j are also jointly distributed according to a multivariate normal distribution with mean vectorV and variances and covariances equal to those of the residualsεj;U∼MVN(V,Σ).

The choice probability of alternativej,p[j], can be formally expressed in terms of the joint probability that utilityUj will assume a value within an infinitesimal interval and that the utilities of the other alternatives will have lower values. This probability element must then be integrated over all possible values ofUjto obtain p[j](see (3.3.54)):

p[j] =

U1<Uj

. . . +∞

Uj=−∞

. . .

U_m<U_j

exp[−1/2(U−V)^TΣ⁻¹(U−V)]

[(2π )^mdet(Σ)]^1/2 dU₁. . . dU_m (3.3.59) The probit model is invariant (see Sect.3.2) if the matrixΣdoes not depend on the vector of systematic utilitiesV. In this case, the choice probability of a generic alternative depends only on systematic utility differences. Thus, Alternative Specific Attributes (ASA) and their coefficients (ASC) can be replaced by their differences with respect to the value of a reference alternative.

To illustrate the effect of variances and covariances on choice probabilities, consider the case of three alternatives (m=3), with systematic utilities equal to zero (VA=VB=VC=0)and the following variance–covariance matrix.

Σ=

⎡

⎣

1 σ_AB 0

σ_AB 1 0

0 0 σ_C²

⎤

⎦

Figure3.13charts the probabilityp[C]obtained with the probit model (3.3.59) for varying values of the parametersσ_AB andσ_C. As the variance ofU_Cincreases compared with those of the other alternatives, the choice probability ofC also increases. The value of the random residualε_Ceventually dominates the value ofV_C,

3.3 Some Random Utility Models 123

Fig. 3.13 Influence of the variance and covariance of residuals on probit choice probabilities

and the perceived utilityU_C is, with high probability, either much higher or much lower than the perceived utilitiesU_AandU_B(limσC→∞p[C] =0.5). Moreover, as the covariance (in this case identical with the correlation coefficient) between the residuals of alternativesAandB increases, the choice probability of alternativeC also increases, becauseAandB are increasingly perceived as a single alternative.

The same effect was shown in Sects.3.3.2and3.3.3for the hierarchical logit model.

In general, the probit model yields choice probabilities similar to those obtained from logit and hierarchical logit models if the same variance–covariance matrix is assumed. Moreover, as mentioned above, it allows for greater flexibility in the specification of the covariance matrix, whose elements can assume whatever value and can be “directly” specified, unlike the logit-type models whose covariance matrix is indirectly defined through the choice network and model parameters.

The flexibility of the variance–covariance matrix can in fact be a problem in the practical use of the probit model. A variance–covariance matrix can contain up to (m(m+1))/2 distinct values, as noted in Sect. 8.3.2, wherem is the number of choice alternatives. Whenmis large, specification and calibration of all the possible values can be problematic. Different methods have been proposed to reduce the number of unknown variance–covariance matrix elements requiring estimation.

All of these methods assume some structure underlying the random residuals. The parameters of this structure determine the elements of the variance–covariance matrix but are fewer in number than the total number of possible unknowns of such a matrix.

A first method, known asFactor Analytic Probit, expresses the vector of random residuals as a linear function of a vectorζ of independent standard normal

variables:

ε_j=

k=1

f_{j k}ζ_k, (3.3.60a)

ε=F ζ (3.3.60b)

where

ε is the(m×1)vector of multivariate normal random variables (factors) with elementsε_j:ε∼MVN(0,Σ)

F is the(m×n)matrix of factor “loadings” with elementsf_{j k}, mapping the vectorζ of standard normal random variables to the vector εof random residuals

ζ is the(n×1)vector of identical and independent standard normal random variables with elementsζ_k:ζ∼MVN(0,I)

From (3.3.60a), the elements of the variance–covariance matrixΣof the random residualsε_j can be expressed as a function of the elementsf_{j k} of matrixF:

Var[εj] =E ε_j²

=E _n

k=1

f_{j k}²ζ_k²

k=1

f_{j k}² ·E ζ_k²

k=1

f_{j k}², (3.3.62)

Cov[εj, ε_h] =E[εjε_h] =E _n

k=1

f_{j k}ζ_k·

k=1

f_hkζ_k

k=1

f_{j k}f_hk·E ζ_k²

k=1

f_{j k}f_hk (3.3.63)

or in vectorial form:

Σ=E[εε^T] =E[F ζ ζ^TF] =FE[ζ ζ^T]F^T=FIF^T=FF^T (3.3.64) Because typically n≪m, the number of unknown elements is reduced from m(m+1)/2 in the matrixΣtom·nin the matrixF. In the extreme case(m=n), the matrix F is low triangular and univocally determined through the Cholesky factorization of the matrixΣ. A relevant application of the factor analytic repre- sentation of the probit model is in path choice, as shown in Sect. 4.3.3.1. Another relevant application based on a particular specification of (3.3.60a) is known in the literature as therandom coefficient probit. It is based on the assumption that the random residualεj derives from the variability of utility function coefficientsβk over the population of decision makers. In particular, for each individuali, coefficientβ_kⁱ is assumed equal to an average valueβkplus a random residualηⁱ_k:

β_kⁱ =β_k+ηⁱ_k k=1,2, . . . , K

3.3 Some Random Utility Models 125 whereK is the total number of coefficients used to define the systematic utilities of themalternatives. By assuming that theηⁱ_kare independently distributed normal variables with zero mean and varianceσ_k²,

ηⁱ_k∼N 0, σ_k²

∀i, k Cov

η_kⁱ, ηⁱ_h

=0 ∀i, k, h it follows that

U_jⁱ=V_jⁱ+ε_jⁱ =

β_kⁱX_kjⁱ =

β_kX_kjⁱ +ηⁱ_kXⁱ_kj

with:

V_jⁱ=

β_kXⁱ_kj; ε_jⁱ =

ηⁱ_kXⁱ_kj; εⁱ ∼MVN(0,Σε) (3.3.61) whereX_kj is the value of attributekin alternativej; it is equal to zero if attribute X_kdoes not appear in the systematic utility of alternativej.

From comparison between (3.3.60a) and (3.3.61) it follows that f_{j k}ⁱ =σkX_{j k}ⁱ

and by substituting into (3.3.62) and (3.3.63) then:

Var εⁱ_j

Xⁱ_kjσ_k2

(3.3.65) Cov

εⁱ_j, ε_hⁱ

Xⁱ_kjXⁱ_khσ_k² (3.3.66) that is, in vectorial form:

F=XΣ^1/2_η and from (3.3.64):

Σε=XΣηX^T

Using this approach, the number of unknown elements of the variance–covariance matrix is reduced from a possible maximum of(m(m+1))/2 to theK of the ma- trixΣη.

The flexibility of the probit model is achieved at the cost of computational com- plexity. The probit model does not possess analytical expressions for its choice probabilities inasmuch as there is no known closed-form solution of the integral (3.3.59).

Numerical integration methods are computationally burdensome when there are more than about five alternatives. Calculation of probit choice probabilities with several alternatives is typically carried out by approximation methods. In the following, three traditional approximate methods are described: the so-called Monte

Carlo or Acceptance–Reject (AR) method, the GHK method, and the Clark approximation. However, it should be said that the last is computationally inefficient and is rarely used in practice.

The Monte Carlo method generates a sample of perceived utilities for the alternatives (these can be thought of as the utilities perceived for each alternative by a sample of decision-makers) and estimates the choice probability of each alternative j as the fraction of times thatj is the alternative with maximum perceived utility.

More specifically, at thekth iteration, the method generates:

– A vectorε^k=(ε^k₁, . . . , ε^k_m)^Tof random residuals drawn from a zero-mean multivariate normal distribution with dispersion matrixΣ.

– A vectorU^kof perceived utilities:U^k=V+ε^k.

– A vectorp^kof deterministic alternative choice probabilities:p^k=(0, . . . ,1, . . . , 0)where the value one is associated to the largest component ofU^k(the alternative with maximum perceived utility).

Consequently, afterniterations, the sample estimatep[jˆ ]of the probabilityp[j]is:

ˆ p[j] =1

k=1

p[j/ε^k] =n_j

n (3.3.67)

whereε^kdenotes thekth draw of vectorεfrom an MVN(0,Σ)distribution, andn_j is the number of times that alternativejis the maximum perceived utility alternative in the sample. It can be shown that the estimator (3.3.67) is unbiased and efficient.

With the Monte Carlo method, each extraction can be considered as the execution of a generalized Bernoulli trial withmpossible outcomes, where outcomej corre- sponds to alternativejwith maximum perceived utility, and occurs with probability p[j]. The joint sample frequency of the results is thus multinomially distributed and the sample variance of the estimatep[jˆ ]is:

Var ˆ p[j]

=1 np[jˆ ]

1− ˆp[j]

(3.3.68) For large enough values ofn, a confidence interval for p[j]can be obtained by assuming thatp[j] is approximately distributed as a normal r.v. with meanp[jˆ ] given by (3.3.67) and variance given by (3.3.68).

In applications, drawing a randomm-vectorεfrom an MVN(0,Σ)distribution can be accomplished indirectly by drawingmindependent values from a standard normalN (0,1)distribution by means of (3.3.60b). In practice, at the generic iterationkthe vectorε^k of pseudorandom draws from a normal multivariate distribution MVN(0,Σ)can be obtained through:

– Drawing a vectorz^kofmnormal standard independent variables.

– Calculatingε^k =F z^k whereF is known within a factor analytic approach or through a Cholesky factorization of the matrixΣ.

The Monte Carlo method, albeit simple to interpret and apply, exhibits some theoretical drawbacks that can be overcome by using different procedures for calculating probit probabilities. As described in Chap. 8, methods for random utility model

3.3 Some Random Utility Models 127 estimation are based on specific theoretical properties of the functionp[β], that is, on how choice probabilities change with respect to model parameters. Namely,p[β]

is required to be doubly differentiable and strictly positive. Becausep[β]does not exhibit a closed form for the probit model, these properties depend on how choice probabilities are simulated. Notably, when applying the Monte Carlo method,p[β]

is a step function (i.e., not continuous) and, in the presence of alternatives with low systematic utilities, it is not guaranteed to be strictly positive.

A possible solution is represented by the smoothed Monte Carlo method, according to which the choice probability vectorp^k at the generic iterationkis given by aθ-parameter multinomial logit probability vectorp^k=(p^k₁, . . . , p_m^k)rather than a deterministic vector. This leads to a continuous, doubly differentiable (3.3.67) function, given as the average of strictly positive logit probabilities rather than 0/1 values. Obviously, probit choice probabilities provided by a smoothed Monte Carlo represent an approximation of actual probit probabilities, proportional to the value of the variance parameterθ. In other words,θ should be chosen so as to provide a satisfactory compromise between speed and stability of convergence, increasing withθ, and reliability in simulated choice probabilities, decreasing withθ. Those concepts are extended in Sect.3.3.7when describing the mixed logit model.

Another possible solution to the operative problems of the Monte Carlo method lies in the GHK method, considered in the literature one of the most stable and accurate. Unlike the Monte Carlo method which supplies contemporaneously an estimate for the choice probabilities of all the alternatives, the GHK method deter- mines the probability of choosing a single alternative on each occasion. This makes it naturally burdensome if the number of alternatives is very high. So as to illustrate the mechanism, let us consider initially the case of a choice set consisting of three alternatives, and let us suppose we wish to determine the probability of choosing alternative 1. Allowing for (3.2.2a) and the theoretical properties of invariant random utility models, the perceived utility of the other two alternatives may be expressed in differential terms with respect to the utility of the considered alternative:

U₂−U₁=(V₂−V₁)+(ε₂−ε₁)→U₂₁=V₂₁+ε₂₁ U₃−U₁=(V₃−V₁)+(ε₃−ε₁)→U₃₁=V₃₁+ε₃₁

The covariance matrixΣ1 of random residualsε21 andε31 may be derived directly from matrixΣ of residuals ε₁. . . ε₃. Then, because we are dealing with a symmetric and positive definite matrix, it may be expressed by Choleski factorization asΣ1=CC^T, given that:

c11 0 c21 c22

Recalling what was stated above concerning the Monte Carlo method, ifz₁and z₂are two standardized normal r.v. then we may write:

ε₂₁=c₁₁·z₁ → U₂₁=V₂₁+c₁₁·z₁

ε₃₁=c₂₁·z₁+c₂₂·z₂ → U₃₁=V₃₁+c₂₁·z₁+c₂₂·z₂

and the probability of choosing the first alternative may be reformulated as follows.

p[1] =Pr

(U₂₁<0)∩(U₃₁<0)

=Pr

(V21+c11z1<0)∩(V31+c21z1+c22z2<0)

=Pr[V21+c11z1<0] ·Pr

(V31+c21z1+c22z2<0)/(V21+c11z1<0)

=Pr

z₁<−V21

c11

·Pr

z₂<−V31+c21z1

c22

z₁<−V21

c11

IfF stands for the distribution law of normal cumulative probability, the probability product previously written becomes

p[1] =F

−V₂₁ c₁₁

−V21/c11

−∞

−V₃₁+c₂₁z₁ c₂₂

f (z₁) dz₁ (3.3.69) The first factor of (3.3.69) may be directly obtained from probability tables of standard normal random variables, and the integral may be calculated numerically by performing at the generic iterationkthe following steps.

– A drawz^k₁is generated of the standard normal random variable z1 truncated at

−V21/c₁₁(to generate az₁truncated at−V21/c₁₁it is enough to generate a standard normalzand calculatez₁=F⁻¹(zF (−V₂₁/c₁₁)).

– From the standard normal probability tables we calculate the value i^k=F

−V31+c21z^k₁ c22

It may be demonstrated that a correct and efficient estimate of the integral of (3.3.69) is obtained by calculating the average of values i^k on a certain number of iterations. The product of the two factors thus calculated, inserted into (3.3.69), supplies a correct and efficient estimate of the choice probabilityp[1]sought.

Generalization of the procedure to the case ofmalternatives is immediate. In this regard, suffice it to think that for a generic alternativej (withj >3)we obtain:

p[j] =Pr[U_ij<0∀i=j] =Pr

z₁<−V_1j c11

·Pr

z₂<−V_2j+c₂₁z₁ c22

z₁<−V1j

c₁₁

·Pr

z3<−V_3j+c₃₁z₁+c₃₂z₂ c₃₃

z₂<−V_2j+c₂₁z₁ c22

∩

z₁<−V_2j c11

. . . The Clark approximation, another traditional method for calculating probit choice probabilities, is based on an approximation for the maximum of a set of

3.3 Some Random Utility Models 129 normal random variables (the maximum is of course itself a random variable). The procedure is first illustrated by referring to a choice among three alternatives. In this case, perceived utilitiesU1, U2, andU3 are distributed according to a multivariate normal distribution with mean vectorV =(V₁, V₂, V₃)^Tand the following variance–covariance matrix.

Σ=

⎡

⎣

σ₁² σ12 σ13

σ₂₁ σ₂² σ₂₃ σ31 σ32 σ₃²

⎤

⎦

Suppose the choice probability of alternative 3,p[3], is to be computed. Clark’s results express the meanV12 and the varianceS₁₂² of the random variable U12 = max(U1, U2)as

V12=V2+(V1−V2)F (α)+γf (α) (3.3.70) S₁₂² =var[U12] =m₁₂−V₁₂²

wherem₁₂is the second moment around zero of the variableU₁₂, and is given by m12=V₂²+σ₂²+

V₁²+σ₁²−V₂²−σ₂²

F (α)+(V1+V2)γf (α) (3.3.71) The constantsγ andαin expressions (3.3.70) and (3.3.71) are, respectively, the standard deviation of the random variable(U₁−U₂):

γ=

σ₁²+σ₂²− −2σ₁₂1/2

and the mean standardized value of the random variable(U₁−U₂):

α=(V₁−V₂)/γ

The symbols f (α) and F (α) denote, respectively, the value of the probability density function and probability distribution function of a standard normal r.v.

N (0,1)evaluated atα:

f (α)=(2π )^−1/2exp(−α²/2) F (α)=

−∞

f (x) dx

Clark’s formulas also give the covariance between variablesU_j andU_12...j₋₁as:

S_j.12...i=cov(Uj, U_12...i)=σ_ij+(S_j.12...i−1−σ_ij)F (α) wherei=j−1. Thus the covariance between variablesU₃andU₁₂is:

S_3.12=cov(U3, U₁₂)=σ₂₃+(σ₁₃−σ₂₃)F (α) The probability of choosing alternative 3 is:

p[3] =Pr[U3≥U12] =Pr[U12−U3≤0] (3.3.72)

AlthoughU₁₂ is not in fact normally distributed, Clark’s method assumes that it can be satisfactorily approximated by a normal r.v. having meanV3, varianceS12

and covarianceS3.12 with the normal r.v.U3. Thus, the choice probability (3.3.72) can be evaluated using standard results on the distribution of the difference of two normal variables (Appendix3.B.2):

p[3] =F

V₃−V₁₂ (σ₃²+S₁₂² −2S3.12)^1/2

(3.3.73) Choice probabilities for more than three alternatives can be calculated by sequentially applying the procedure described above. The probability of choosing the generic alternativej can be obtained by computing sequentially the mean, variance, and covariance of nested pairs of perceived utilities ordered in such a way thatj is the last alternative. For example, the mean and variance ofU₁₂=max(U1, U₂)as well as its covariance withU₃are computed first. Subsequently the mean and variance of the variableU₁₂₃=max(U3, U₁₂)are computed together with its covariance withU₄, and so on until the comparison is made between:

U12...j−1=max

Uj−1,max

U_j−2. . .max(U1, U2)

and U_j. At this point, the probability p[j] is obtained by applying expression (3.3.73). The entire sequence has to be repeated to calculate the probability of each alternative.

Dalam dokumen Book Transportation Systems Analysis Models and Applications (Halaman 137-146)