2 Methodology 2.1 Cosin Distribution - (ICSA Book Series in Statistics) Wenqing He, Liqun Wang,

The rest of this chapter is organized as follows. Section 2 introduces the inference background and the new method that can control the FDR and sensitivity simultaneously. Numerical studies are reported in Sect.3to show the superb of the proposed method in high-dimensional settings even when strong multicollinearity exists in the predictors. The conclusion and discussion are described in Sect.4.

2 Methodology

where v_M_k = 1/A_M_k. Hence, X^T_M

ku_M_k = A_M_k1_M_k. The correlation vector between the equiangular direction and all predictors can be calculated by

a=X^Tu_M_k. (8)

LetS_M^T

kX^T_M

ku_M_k be a subvector ofafor|Mk| < n. At thekth stage of selecting the entering predictor, letCˆ_kbe the largest absolute value of the correlation between the entering variables and the current residualZ_k. LARS finds the predictor that has the smallest angle with the current residual, and proceeds in the direction ofu_M_k, which has the same angle with all X_j_k’s,j_k ∈ Mk, in a step size ofγˆ until the next predictor earns its “most correlated” position. By the end of each stage, LARS updates the mean function, i.e.,

μ_M_k+₁ = ˆμ_M_k + ˆγ u_M_k, (9) where

ˆ γ = min

l /∈Mk

( Cˆ_k− ˆc_l

A_M_k−a_l, Cˆ_k+ ˆc_l A_M_k +a_l

)

, (10)

wherecˆ_lis the current correlation of thelth remaining predictor variable and min⁺ indicates the smallest positive value. The mean functionμˆ can be written as

μ_M_k =U_M_k_M_k, (11) where U_M_k =

u1,u2,· · ·,uk

and _M = (γˆ₁,γˆ₂, . . . ,γˆ_k)^T. Denote β(ˆ Cˆ_k) as the regression coefficients of the active predictors at stage k, β(ˆ Cˆ_k) = (X^T_SX_S)⁻¹X_S^TU_M_k_M_k. The current correlation can also be expressed as the score vector of the least squares criterion with entering predictor:

Cˆ_k = −sj_k

∂

∂β_j_k n i=1

(y_i −x_i^Tβ)²

β= ˆβ(Cˆ_k). (12) Define θ (X_j_k, Z_k) as the angle between the vector X_j_k and Z_k. Since X_j_k is standardized, we have

cos{θ (Xj_k, Zk)} = X^TZ_k _∞ Z_k ₂ = Cˆ_k

Z_k ₂. (13)

In general, |cos{θ (X_j_k, Z_k)}|, k = 1,2,3, . . ., diminish stochastically. LARS solution path ends at a predetermined step or when the angleθ (X_j_k, Z_k) is very close to^π₂, i.e., the remaining variable is almost orthogonal to the current residual.

Lemma 1 ForA_M_k ≥ 1, the sequence|cos{θ (X_j_k, Z_k)}|,k =1,2, . . . , n−1, is non-increasing along the LARS solution path.

Proof For simplicity, we useθ_kto denoteθ (X_j_k, Z_k).

Note thatCˆ_kdeclines withkincreases (Efron et al.2004). Showing 1≥ _Z^C^ˆ₁¹₂ ≥

ˆ C₂

Z₂ ₂ ≥. . . is equivalent to show _ˆ^C^ˆ^k

C_k₊₁ ≥ _Z^Z_k₊^k₁²₂ ≥1, fork=1,2, . . ..

By Eq. (9),Z_k−Z_k₊₁= ˆγ_ku_M_k. Hence,γˆ_k²=(Z_k−Z_k₊₁)^T(Z_k−Z_k₊₁), for k=1,2, . . ..

From Eq. (5), (8), (9), and (12), we obtain

Cˆ_k− ˆC_k₊₁= ˆγ_kA_k ≥ ˆγ_k = Z_k−Z_k₊₁ ₂≥ Z_k ₂− Z_k₊₁ ₂. The last inequality is based on the triangle inequalities. We can obtain _ˆ^C^ˆ^k

Ck+1 ≥

Zk 2

Zk+1 2, that is,|cos(θk)| ≥ |cos(θk+1)|, fork=1,2, . . . , n−1.

Note that in the traditional linear regression model with intercept,(1/A_M_k)²is the first element of the diagonal of hat matrix, which is always bounded by_n¹and 1.

Lemma 2 ForZ(=0)∈Rⁿ, the following events are equivalent:

{ Z_k₊₁ ₂cosθ_k₊₁≤ Z_k ₂cosθ_k ≤ Z_k₋₁ ₂cosθ_k₋₁} = {θ_k₋₁≤θ_k ≤θ_k₊₁}. Proof The event in the left hand is equivalent to{ ˆC_k₊₁ ≤ ˆC_k ≤ ˆC_k₋₁}, fork = 2,3, . . ., which has the monotone property as shown in Efron et al. (2004). The monotonicity ofθ’s and the one-to-one correspondence ofCˆ_kandθ_k,k=2,3, . . . have been verified in Lemma1. Hence, the above events are equivalent.

Recall in the linear regression model (1), negligible or zero value of residual e_i =y_i−x^T_i βˆ shows a good prediction. In the LARS context, the absolute value of the corresponding angle at each knot is bounded by ^π₂, and no more predictor will enter the model once the angle is “big” enough. We consider the angle close to^π₂ to be “big” enough.

We can make inference using the angle by assuming the angles follow a (truncated) cosine distribution. We connect the angleθ_kof each LARS solution path to the incremental null hypothesis that measures whetherMkstatistically surpasses Mk−1or not. The limiting distribution of the maximum angle can be used to do an efficient and robust significance test for each predictor variable.

We propose a truncated cosine distribution in a data-driven fashion. Letθ₍₁₎and θ_(n)be the minimum and maximum order statistics ofθ’s, respectively. Under the domain of[θ₍₁₎, θ_(n)], we defined the following (truncated) cosine distribution with the density function:

f (θ )=

⎧⎪

⎨

⎪⎩ 1 2bcos

θ−a b

ifθ(1)≤θ≤θ_(n), 0 otherwise,

(14)

where the location parameter a = (θ₍₁₎ +θ_(n))/2 and the scale parameter b = (θ_(n)−θ₍₁₎)/π.

Its cumulative density function (CDF) is given by

F (θ )=

⎧⎪

⎪⎪

⎪⎨

⎪⎪

⎩

0 ifθ < θ₍₁₎, sin²

θ−a 2b +π

ifθ₍₁₎≤θ≤θ_(n), 1 ifθ > θ_(n).

(15)

This CDF,F (θ ), of cosine distribution can be used to do hypotheses testing of whether “Mk improves overMk−1” by the following theorem.

Theorem 1 Assume that the covariate vectorsXj’s, j = 1, . . . , p, are linearly independent in the LARS solution path. Letθ_{(j )},j =1, . . . , n, be the corresponding angle at each knotCˆjin the firstnsteps.aandbare defined in Eq. (14). If Lemma1 and2hold:

n 2b²

π 2 −θ_(n)

2 d

→χ₂²asn→ ∞, (16) whereχ₂²denotes a chi-square random variable withdf =2.

Proof We know thatθ_{(j )}’s,j =1, . . . , n, are monotone increasing. Hence,θ₍₁₎and θ_(n)can be considered as the minimum and maximum order statistics ofθ’s. As the dimension increases,^π₂ −θ_(n)will diminish stochastically.

Letθ˜_n = _2bⁿ2(^π₂ −θ(n))². From the CDF of the cosine distribution Eq. (15) and the basic trigonometric formula, the distribution ofθ˜ncan be derived as follows:

P (θ˜_n≤g)=P

n 2b²

π 2 −θ_(n)

≤g '

=P (

θ_(n)≥ π 2 −b·

2g n

1/2)

=1−sin²ⁿ

⎡

⎢⎣

π 2 −b·

2g n

1/2

−a

2b +π

⎤

⎥⎦

=1−cos²ⁿ 1

1 2

2g n

1/2

+π 4 − π

4b + a 2b

, over 0≤g≤ nπ² 8b². Therefore, the limiting distribution ofθ˜_nis obtained as

nlim→∞P (θ˜_n≤g)=1− lim

n→∞cos²ⁿ 11

2 2g

n 1/2

+π 4 − π

4b+ a 2b

≈1−e⁻^g/2, g≥0, since cos²ⁿ

2b(^2g_n)^1/2+^π₄ −_4b^π +_2b^a4

≈(1−_4n^g)²ⁿ =e⁻^g/2asn→ ∞. Hence,θ˜_n→^d χ₂².

The limiting distribution ofθ˜_ndetermines if the corresponding angle at knotCˆ_kis

“big” enough. A sequence ofp-values can be obtained by using the above property P (χ₂²>θ˜_j),j =1, . . . , n.

2.2 Selection Criteria

Definition 1 (Family of “Accumulation Tests,” Li & Barber,2017) LetMmbe the model that includes the firstmentries. For an integerk∈ {1, . . . , m}, a sequence of null hypotheses,H_j,j =1,2, . . . , k, measures whether modelMj statistically surpasses Mj−1 or not. Suppose there is a sequence of uniformly distributed p- values,p1, p2, . . . , p_k ∈ [0,1] corresponding to the hypothesesH_j. For a given function φ : [0,1] → [0,∞)satisfying ₁

t=0φ (t )dt = 1, where φ is termed

“accumulation function,” the “accumulation tests” determine the stopping pointkˆ to control FDR at levelα

k_φ=max

⎧⎨

⎩k∈ {1, . . . , m} : 1 k

k j=1

φ (p_j)≤α

⎫⎬

⎭. (17)

We suggest using φ (x) = √^x

1−x² to choose the stopping point kˆ_φ. Testing the hypothesis H₀ : the jth angle is the maximum one that is equivalent to testing whether the current model is adequate along the LARS solution path. We reject all hypotheses up tokˆ_φto obtain the final model.

Dalam dokumen (ICSA Book Series in Statistics) Wenqing He, Liqun Wang, Jiahua Chen, Chunfang Devon Lin - Advances and Innovations in Statistics and Data Science-Springer (2022) (Halaman 72-77)