Concept of Copula - Joint Distribution Modelling of Vehicle Dynamic Parameters Using Copula

5.4 Joint Distribution Modelling of Vehicle Dynamic Parameters Using Copula

5.4.1 Concept of Copula

A copula is a joint distribution function with standard uniform univariate margins U[0,1] that links a stochastic multivariate relationship with pre-specified univariate marginal distribution functions. The copula theory can be employed to construct 2-dimensional probability distributions for any marginal distributions. Given two continuous random variables (𝑋₁, 𝑋₂) with cumulative distribution functions (𝐹₁(𝑥₁), 𝐹₂(𝑥₂)), a joint bivariate cumulative distribution function (CDF) 𝐹(𝑥₁, 𝑥₂) can be generated using the following expression:

𝐹(𝑥₁, 𝑥₂) = 𝐶_𝜃(𝑢₁ = 𝐹₁(𝑥₁), 𝑢₂ = 𝐹₂(𝑥₂))………5-3 The function 𝐶 (copula) is a CDF joining marginal distributions 𝐹₁(𝑥₁), 𝐹₂(𝑥₂). In case of marginal distributions of continuous variables, by differentiating 𝐹(𝑥₁, 𝑥₂), the 2-dimensional copula density function can be obtained as:

𝑓(𝑥₁, 𝑥₂) = 𝑐_𝜃[𝐹₁(𝑥₁), 𝐹₂(𝑥₂)]𝑓₁(𝑥₁)𝑓₂(𝑥₂)………..………5-4 where 𝑐_𝜃[𝐹₁(𝑥₁), 𝐹₂(𝑥₂)] is the density of the copula and is obtained by computing the second partial derivative of the copula on the two variables:

𝑐_𝜃[𝐹₁(𝑥₁), 𝐹₂(𝑥₂)] = ^𝜕²^𝐶

𝜕𝑢1𝜕𝑢2 and 𝑢₁ = 𝐹₁(𝑥₁), 𝑢₂ = 𝐹₁(𝑥₂)……….…..5-5 Because the dependence structures modelled through the copulas are separated from the univariate marginal distributions, an appropriate copula function is the one which suitably elucidates the dependence features of the data. Several techniques have been used in the literature to evaluate the degree of correlation between the random variables. There are two commonly used non-parametric methods of detecting correlation that are based on the concept of concordant and discordant pairs of observations are, Kendall’s τ (Kendall, 1938) and Spearman’s ρs (Spearman, 1904). Kendall’s τ measure of dependence between the random variables (X1, X2) is defined as the probability of concordance minus the probability of discordance and is expressed as:

𝜏(𝑋₁, 𝑋₂) = 𝑃 ((𝑋₁− 𝑋̃)(𝑋₁ ₁− 𝑋̃) > 0) − 𝑃 ((𝑋₂ ₁− 𝑋̃)(𝑋₁ ₁− 𝑋̃) < 0)……….….5-6 ₂ Where, (𝑋̃, 𝑋₁ ̃) is an independent copy of (𝑋₂ ₁, 𝑋₂). The second non-parametric measure of

dependence i.e. Spearman’s rank correlation coefficient ρs is three times the difference between the probability of concordance and the probability of discordance:

𝜌_𝑆(𝑋₁, 𝑋₂) = 3 (ℙ ((𝑋₁− 𝑋̃₁)(𝑋₂− 𝑋̆₂) > 0) − ℙ ((𝑋₁− 𝑋̃₁)(𝑋₂− 𝑋̆₂) < 0))…………5-7 where (𝑋₁, 𝑋₂), (𝑋̃₁, 𝑋̃₂) and (𝑋̆₁, 𝑋̆₂) are the three independent identically distributed random vectors each with a common joint distribution function 𝐹(. , . ) and margins 𝐹₁ and 𝐹₂. More specifically, let 𝐴₁and 𝐴₂denote the number of concordant and discordant pairs for 𝑛 observations; and 𝑅(𝑋̃₁)and 𝑅(𝑋̃₂) denote the ranks of a pair of variables (𝑋₁, 𝑋₂), Kendall’s 𝜏 and Spearman’s 𝜌_𝑆 can then be defined as

𝜏 = ^𝐴¹^{− 𝐴}²

0.5𝑛(𝑛−1)………..……….5-8

𝜌_𝑆 = ^∑ ^𝑅(𝑋̃^1𝑖^)𝑅(𝑋̃^2𝑖^)−𝑛(

𝑛+1 2 )² 𝑛𝑖=1

√(∑^𝑛_𝑖=1𝑅(𝑋̃_1𝑖)²−𝑛(^𝑛+1₂ )²)(∑^𝑛_𝑖=1𝑅(𝑋̃_2𝑖)²−𝑛(^𝑛+1₂ )²)

………..5-9

Besides Kendall’s 𝜏 and Spearman’s 𝜌_𝑆 measure of dependence, the traditional Pearson’s product-moment correlation coefficient 𝑟 that is a measure of linear dependence between 𝑋₁ and 𝑋₂ poses several limitations. Unlike the rank-based dependence measures, the Pearson’s correlation coefficient is a parametric technique that largely depends upon the marginal distributions of the population. It performs better than rank-based measures only for bivariate elliptical distributions (including bivariate normal distribution). Therefore, considering the advantages of rank-based dependence measures that both are not dependent on the margins, Kendall’s 𝜏 and Spearman’s 𝜌_𝑆 are commonly used to characterize the dependence structures in the copula literature, rather than the traditional Pearson’s correlation coefficient.

A rich set of copula types have been generated using the method of inversion, geometric methods and algebraic methods (Nelsen, 2006) and each of the copula is characterised by a different dependence structure 𝜃 on the data. Also it is to be noted that Kendall’s 𝜏 and Spearman’s 𝜌_𝑆 are functions of the respective copula and the dependence parameter, and are not dependent on the marginal distributions. A brief overview of the copula functions, that includes the Gaussian copula, the Farlie-Gumbel-Morgenstern (FGM) copula and various Archimedean copulas like the Gumbel-Hougaard copula, the Frank copula, the Clayton copula, the Ali-Mikhail-Haq (AHM) copula and the Joe copula, and dependence structures of each copula type is presented in Table 5-3.

Table 5-3 Characteristics and association measures of bivariate copulas

Copula type Function 𝐶_𝜃(𝑢, 𝑣) 𝜃 domain Kendall’s 𝜏 and domain Spearman’s 𝜌_𝑆 and domain

Gaussian ∫ ∫ 1

2𝜋√1 − 𝜃²

𝜑⁻¹(𝑣)

−∞

𝜑⁻¹(𝑢)

−∞

exp (−𝑠²− 2𝜃𝑠𝑡 + 𝑡²

2(1 − 𝜃²) )𝑑𝑠𝑑𝑡 𝜃 𝜖 [−1,1]

𝜋sin⁻¹𝜃 𝜏 𝜖 [−1,1]

𝜋sin⁻¹(𝜃 2) 𝜌_𝑆 𝜖 [−1,1]

F-G-M 𝑢₁𝑢₂[1 + 𝜃(1 − 𝑢₁)(1 − 𝑢₂)] 𝜃 𝜖 [−1,1]

2 9𝜃 𝜏 𝜖 [−2

9,2 9]

1 3𝜃 𝜌_𝑆 𝜖 [−1

3,1 3] Gumbel

exp (−[(− ln 𝑢₁)^𝜃+ (− ln 𝑢₂)^𝜃]¹^⁄^𝜃) 𝜃 𝜖 [−1, ∞)

1 −1 𝜃 𝜏 𝜖 (0,1)

No closed form

Frank

−1

𝜃ln (1 +(exp(−𝜃𝑢₁) − 1)(exp(−𝜃𝑢₂) − 1)

exp(−𝜃) − 1 ) 𝜃 𝜖 (−∞, ∞)

1 −4

𝜃[1 − 𝐷_𝐹(𝜃)]

𝜏 𝜖 (−1,1)

1 −12

𝜃 (𝐷₁|𝜃) − 𝐷₂(𝜃) 𝜌_𝑆 𝜖 [−1,1]

Clayton

(𝑢₁^−𝜃+ 𝑢₂^−𝜃 − 1)^{−1 𝜃}^⁄ 𝜃 𝜖 (0, ∞) 𝜃 𝜃 + 2 𝜏 𝜖 (0,1)

Complicated expression 𝜌_𝑆 𝜖 (−1,1)

A-M-H 𝑢₁𝑢₂

1 − 𝜃(1 − 𝑢₁)(1 − 𝑢₂)

𝜃 𝜖 [−1,1] 3𝜃 − 2

3𝜃 −2(1 − 𝜃)²

3𝜃² ln(1 − 𝜃) 𝜏 𝜖 (−0.182, 0.333)

[[12(1 + 𝜃)𝑑𝑖{ln}(1

− 𝜃)

− 24(1 − 𝜃) ln(1

− 𝜃)]/𝜃²]

−3(𝜃 + 12)

𝜌_𝑆 𝜖 [−0.271, 0.478] 𝜃 Joe 1 − [(1 − 𝑢₁)^𝜃+ (1 − 𝑢₂)^𝜃 − (1 − 𝑢₁)^𝜃(1 − 𝑢₂)^𝜃]¹^⁄^𝜃 𝜃 𝜖 [1, ∞) 1 +4

𝜃𝐷_𝐽(𝜃) 𝜏 𝜖 [0,1)

No simple form 𝜌_𝑆 𝜖 [0,1) 𝐷_𝐹(𝜃) = ^𝑘

𝜃^𝑘∫ ^𝑡^𝑘

exp(𝑡)−1𝑑𝑡, 𝑘 = 1, 2

𝜃

0 ; 𝑑𝑖{ln}(𝑥) = ∫₁^𝑥_1−𝑡^{ln 𝑡}𝑑𝑡

In this study, the selection process of copula functions for bivariate and trivariate probability analysis was performed intuitively (Chowdhary et al., 2011). This process resulted from designated acceptable dependence domains and nature of association of the data under consideration. A set of copula functions have also been employed to build the 2-D probability distribution by assessing suitable goodness-of-fit measures whereas for trivariate analysis, the elliptical Gaussian copula has been considered in this study. The trivariate Gaussian copula can be defined as

𝐶_𝜬(𝑢, 𝑣, 𝑤) = 𝜙_𝜬[𝜙⁻¹(𝑢), 𝜙⁻¹(𝑣), 𝜙⁻¹(𝑤)]………5-10 in which

𝜙_𝜬[𝜙⁻¹(𝑢), 𝜙⁻¹(𝑣), 𝜙⁻¹(𝑤)] = ∫ ∫ ∫ _2𝜋_{3 2}⁄¹|𝜬|^1/2 𝜑⁻¹(𝑤)

−∞

𝜑⁻¹(𝑣)

−∞ 𝑒𝑥𝑝 (−¹

2𝒘^𝑇𝜬⁻¹𝒘) 𝑑𝒘

𝜑⁻¹(𝑢)

−∞

………. ……..5-11

where 𝚸 = [

1 𝜌₁₂ 𝜌₁₃ 𝜌₁₂ 1 𝜌₂₃ 𝜌₁₃ 𝜌₂₃ 1

] is the symmetrical correlation matrix with 𝜌_𝑖𝑗 𝜖 [−1,1]; 𝐰 = (w₁, w₂, w₃)^𝑇 represents the corresponding integral variables.

The primary argument for the choice of Gaussian copula stems from its capability to model all ranges of dependence (−1,1) for multidimensional case. It has a property that as the number of dimensions (n) in a Gaussian copula increases, the number of parameters in the multivariate density function also increases in the order of 𝑛². With a view to model the general positive and negative dependence structures among the three microscopic traffic variables, the trivariate Gaussian copula can therefore be a convenient choice and is hence, employed in this study.

The most difficult problem in implementing copula theory is the estimation of unknown parameters of the copula. Some commonly used approaches that are mostly employed for estimating copula parameters are Inference from Margins (IFM) approach (Joe, 1997), maximum pseudo-likelihood method and approach based on the rank correlation (Genest &

Rivest, 1993). Another straightforward approach for estimating the dependence parameter resorts to the approach based on rank correlation. This approach utilises the relationship between the dependence parameter 𝜃 and the two rank-based estimators Kendall’s 𝜏 and Spearman’s 𝜌_𝑆. Kojadinivic & Yan (2010) proved that Kendall’s 𝜏 performs significantly better than Spearman’s 𝜌_𝑆; also the expression of Kendall’s 𝜏 is explicit for all copula families. Due to these advantages, inversion of rank correlation coefficient Kendall’s 𝜏 method is widely used for estimating the dependence parameter 𝜃. This method is very simple and the marginal

distributions need not be modelled to obtain an estimate of 𝜃. In the current paper, parameters of bivariate Gaussian, FGM and Archimedean copulas are computed using the approach based on Kendall’s 𝜏 while for the trivariate case, maximum pseudo-likelihood estimator is used.

Dalam dokumen Parametric study of heterogeneous and no-lane disciplined traffic stream (Halaman 128-132)