• Tidak ada hasil yang ditemukan

The rest of this chapter is organized as follows. Section 2 introduces the inference background and the new method that can control the FDR and sensitivity simultaneously. Numerical studies are reported in Sect.3to show the superb of the proposed method in high-dimensional settings even when strong multicollinearity exists in the predictors. The conclusion and discussion are described in Sect.4.

2 Methodology

where vMk = 1/AMk. Hence, XTM

kuMk = AMk1Mk. The correlation vector between the equiangular direction and all predictors can be calculated by

a=XTuMk. (8)

LetSMT

kXTM

kuMk be a subvector ofafor|Mk| < n. At thekth stage of selecting the entering predictor, letCˆkbe the largest absolute value of the correlation between the entering variables and the current residualZk. LARS finds the predictor that has the smallest angle with the current residual, and proceeds in the direction ofuMk, which has the same angle with all Xjk’s,jk ∈ Mk, in a step size ofγˆ until the next predictor earns its “most correlated” position. By the end of each stage, LARS updates the mean function, i.e.,

ˆ

μMk+1 = ˆμMk + ˆγ uMk, (9) where

ˆ γ = min

l /∈Mk

+

( Cˆk− ˆcl

AMkal, Cˆk+ ˆcl AMk +al

)

, (10)

wherecˆlis the current correlation of thelth remaining predictor variable and min+ indicates the smallest positive value. The mean functionμˆ can be written as

ˆ

μMk =UMkMk, (11) where UMk =

u1,u2,· · ·,uk

and M = ˆ1ˆ2, . . . ,γˆk)T. Denote β(ˆ Cˆk) as the regression coefficients of the active predictors at stage k, β(ˆ Cˆk) = (XTSXS)1XSTUMkMk. The current correlation can also be expressed as the score vector of the least squares criterion with entering predictor:

Cˆk = −sjk

2

∂βjk n i=1

(yixiTβ)2

β= ˆβ(Cˆk). (12) Define θ (Xjk, Zk) as the angle between the vector Xjk and Zk. Since Xjk is standardized, we have

cos{θ (Xjk, Zk)} = XTZk Zk 2 = Cˆk

Zk 2. (13)

In general, |cos{θ (Xjk, Zk)}|, k = 1,2,3, . . ., diminish stochastically. LARS solution path ends at a predetermined step or when the angleθ (Xjk, Zk) is very close toπ2, i.e., the remaining variable is almost orthogonal to the current residual.

Lemma 1 ForAMk ≥ 1, the sequence|cos{θ (Xjk, Zk)}|,k =1,2, . . . , n−1, is non-increasing along the LARS solution path.

Proof For simplicity, we useθkto denoteθ (Xjk, Zk).

Note thatCˆkdeclines withkincreases (Efron et al.2004). Showing 1≥ ZCˆ112

ˆ C2

Z2 2. . . is equivalent to show ˆCˆk

Ck+1ZZk+k122 ≥1, fork=1,2, . . ..

By Eq. (9),ZkZk+1= ˆγkuMk. Hence,γˆk2=(ZkZk+1)T(ZkZk+1), for k=1,2, . . ..

From Eq. (5), (8), (9), and (12), we obtain

Cˆk− ˆCk+1= ˆγkAk ≥ ˆγk = ZkZk+1 2Zk 2Zk+1 2. The last inequality is based on the triangle inequalities. We can obtain ˆCˆk

Ck+1

Zk 2

Zk+1 2, that is,|cosk)| ≥ |cosk+1)|, fork=1,2, . . . , n−1.

Note that in the traditional linear regression model with intercept,(1/AMk)2is the first element of the diagonal of hat matrix, which is always bounded byn1and 1.

Lemma 2 ForZ(=0)∈Rn, the following events are equivalent:

{ Zk+1 2cosθk+1Zk 2cosθkZk1 2cosθk1} = {θk1θkθk+1}. Proof The event in the left hand is equivalent to{ ˆCk+1 ≤ ˆCk ≤ ˆCk1}, fork = 2,3, . . ., which has the monotone property as shown in Efron et al. (2004). The monotonicity ofθ’s and the one-to-one correspondence ofCˆkandθk,k=2,3, . . . have been verified in Lemma1. Hence, the above events are equivalent.

Recall in the linear regression model (1), negligible or zero value of residual ei =yixTi βˆ shows a good prediction. In the LARS context, the absolute value of the corresponding angle at each knot is bounded by π2, and no more predictor will enter the model once the angle is “big” enough. We consider the angle close toπ2 to be “big” enough.

We can make inference using the angle by assuming the angles follow a (truncated) cosine distribution. We connect the angleθkof each LARS solution path to the incremental null hypothesis that measures whetherMkstatistically surpasses Mk1or not. The limiting distribution of the maximum angle can be used to do an efficient and robust significance test for each predictor variable.

We propose a truncated cosine distribution in a data-driven fashion. Letθ(1)and θ(n)be the minimum and maximum order statistics ofθ’s, respectively. Under the domain of[θ(1), θ(n)], we defined the following (truncated) cosine distribution with the density function:

f (θ )=

⎧⎪

⎪⎩ 1 2bcos

θa b

ifθ(1)θθ(n), 0 otherwise,

(14)

where the location parameter a = (1) +θ(n))/2 and the scale parameter b = (n)θ(1))/π.

Its cumulative density function (CDF) is given by

F (θ )=

⎧⎪

⎪⎪

⎪⎨

⎪⎪

⎪⎪

0 ifθ < θ(1), sin2

θa 2b +π

4

ifθ(1)θθ(n), 1 ifθ > θ(n).

(15)

This CDF,F (θ ), of cosine distribution can be used to do hypotheses testing of whether “Mk improves overMk1” by the following theorem.

Theorem 1 Assume that the covariate vectorsXj’s, j = 1, . . . , p, are linearly independent in the LARS solution path. Letθ(j ),j =1, . . . , n, be the corresponding angle at each knotCˆjin the firstnsteps.aandbare defined in Eq. (14). If Lemma1 and2hold:

n 2b2

π 2 −θ(n)

2 d

χ22asn→ ∞, (16) whereχ22denotes a chi-square random variable withdf =2.

Proof We know thatθ(j )’s,j =1, . . . , n, are monotone increasing. Hence,θ(1)and θ(n)can be considered as the minimum and maximum order statistics ofθ’s. As the dimension increases,π2θ(n)will diminish stochastically.

Letθ˜n = 2bn2(π2θ(n))2. From the CDF of the cosine distribution Eq. (15) and the basic trigonometric formula, the distribution ofθ˜ncan be derived as follows:

P (θ˜ng)=P

&

n 2b2

π 2 −θ(n)

2

g '

=P (

θ(n)π 2 −b·

2g n

1/2)

=1−sin2n

⎢⎣

π 2b·

2g n

1/2

a

2b +π

4

⎥⎦

=1−cos2n 1

1 2

2g n

1/2

+π 4 − π

4b + a 2b

2

, over 0≤g2 8b2. Therefore, the limiting distribution ofθ˜nis obtained as

nlim→∞P (θ˜ng)=1− lim

n→∞cos2n 11

2 2g

n 1/2

+π 4 − π

4b+ a 2b

2

≈1−eg/2, g≥0, since cos2n

31

2b(2gn)1/2+π44bπ +2ba4

(1−4ng)2n =eg/2asn→ ∞. Hence,θ˜nd χ22.

The limiting distribution ofθ˜ndetermines if the corresponding angle at knotCˆkis

“big” enough. A sequence ofp-values can be obtained by using the above property P (χ22˜j),j =1, . . . , n.

2.2 Selection Criteria

Definition 1 (Family of “Accumulation Tests,” Li & Barber,2017) LetMmbe the model that includes the firstmentries. For an integerk∈ {1, . . . , m}, a sequence of null hypotheses,Hj,j =1,2, . . . , k, measures whether modelMj statistically surpasses Mj1 or not. Suppose there is a sequence of uniformly distributed p- values,p1, p2, . . . , pk ∈ [0,1] corresponding to the hypothesesHj. For a given function φ : [0,1] → [0,)satisfying 1

t=0φ (t )dt = 1, where φ is termed

“accumulation function,” the “accumulation tests” determine the stopping pointkˆ to control FDR at levelα

ˆ

kφ=max

⎧⎨

k∈ {1, . . . , m} : 1 k

k j=1

φ (pj)α

⎫⎬

. (17)

We suggest using φ (x) = √x

1x2 to choose the stopping point kˆφ. Testing the hypothesis H0 : the jth angle is the maximum one that is equivalent to testing whether the current model is adequate along the LARS solution path. We reject all hypotheses up tokˆφto obtain the final model.