Rate and Sensitivity Using Least Angle
the selection procedure for the important variables. In this aspect, methods from classical statistical inference theory may be invalid due to the stochastic components in the high-dimensional structure. Some recent developments on making inference while doing variable selection using LASSO and forward-type regression can be found in the literature. Relatively recent approaches onpost-selection inferencewere discussed in Berk et al. (2013), Lockhart et al. (2014), and Lee et al. (2016). Berk et al. (2013) tackled a valid post-selection inference(PoSI) problem by forming statistical tests and obtaining confidence intervals in linear models after selecting a subset of variables in a data-driven fashion. Lockhart et al. (2014) and Lee et al. (2016) illustrated the post-selection inference of LASSO by forming exact hypothesis tests and constructing confidence intervals, respectively.
Assuming the relationship between the response and the predictor variables can be postulated through a linear model:
Y=Xβ+, (1)
whereX ∈ Rn×p,Xi ∈ Rp×1is theith column vector ofX,cov(Xi)= , i = 1, ..., n,β ∈ Rp is the vector of the coefficient of the respective covariates, and ∼N (0, σ2I )is the noise in the model.
For high-dimensional data analysis with p > n, variables selection is often applied to obtain variables that have real relationship with the response through regularization:
βˆ =arg min
β
Y−Xβ 22+pλ(β)
,
where pλ(β)is the penalty function of the coefficients β andλ ≥ 0 is a tuning parameter. Many regularization methods with different penalty functions have been proposed in the literature. For example, LASSO regularization (Tibshirani1996), or anL1penalty, takes the absolute value of the coefficients as the penalty
βˆ =arg min
β
Y −Xβ 22+λ β 1
. (2)
The LASSO solution pathβ(λ)ˆ can be viewed as a continuous piecewise linear function of the tuning parameterλwith a sequence of decreasing knots, i.e.,λ1 ≥ λ2 ≥ · · · ≥λn ≥0. To choose a predictor that enters the LASSO solution path at a given knot, Lockhart et al. (2014) proposed a covariance test statistic. LetMk = {j1, . . . , jk}be the LASSO solution path with increasing complexity controlled by the tuning parameterλ. The corresponding tuning parameter at each step isλi,i = 1, . . . , k. Note thatλ0 = ∞corresponds toM0 = ∅. Thejkth (jk ≥2) predictor is added into the model after obtaining the solution pathMk−1. Let the estimates of the coefficientβat the end of thek−1th step beβ(λˆ k). If we refit the LASSO by using just the variables inMk−1with the tuning parameterλk, the estimates at the end of this step areβˆMk−1(λk). Thecovariance test statisticof thejkth predictor proposed in Lockhart et al. (2014) is then defined as
Tjk = 1 σ2·
Y,Xβ(λˆ k) − Y,XMk−1βˆMk−1(λk)
. (3)
The statisticTjk measures how much contributionXjk made to improve the fitted model over the interval (λk−1, λk). At a high probability, a larger value of Tjk
determines bigger contribution of the variableXjk in the modelMk−1
0{jk}. Under the null hypothesis that all truly important variables are contained in the model Mk−1
0{jk},Tjk
→d exp(1), asn, p→ ∞.
Lee et al. (2016) discussed a general scheme for post-selection inference that yields exact p-values and confidence intervals in the Gaussian case. Recall the linear model (1) withμ=Xβand∼N (0, σ2I ). Under the deterministic design matrix setting,y ∼N (μ, σ2I ). For a given matrixMand vectorb, a set of linear inequalities in y, i.e.,{My ≤ b}, can be used for variable selection. Let M be the current solution set andη =XM(XTMXM)−1ej,j =1,2, . . . ,|M|, whereej is a vector having 1 for thejth element and 0’s elsewhere. Inferences aboutηTμ conditional on{My≤b}can be made using a truncated normal distribution.
This property provides the possibility of constructing a 1−α level selection interval forηTμ, which can be obtained by solving the inequalitiesηTμsuch that P (ηTμ)≥1−α/2 andP (ηTμ)≤α/2, respectively.
The path-based regression algorithms are widely used in high-dimensional statistics Fan & Lv (2010), such as forward-type regression, LASSO (Tibshirani 1996), LARS (Efron et al.2004), SIS (Fan & Lv2008), and DTCCS (Zhao et al.
2021). To obtain a final model using these methodologies, variables are added either one by one as in LARS and LASSO, or one group after another as in DTCCS. For the LASSO method, the number of non-zero variables in the “best” final model only depends on a single tuning parameter, which means that a sequence of “knots” of tuning parameters determine different final models. For the DTCCS, the candidate model size is predetermined, and a group of monotone values of tuning parameters is used to form a final model.
Since the final model is determined by either the tuning parameter as in LASSO or the predetermined number of variables in the final model as in DTCCS, there might be unimportant variables being selected in the final model. The false discovery rate (FDR) can be used to assess the final model (G’Sell et al.2016;
Li & Barber 2017). Let Mk = {j1, . . . , jk} be the important variable solution set with increasing complexity. G’Sell et al. (2016) proposed the “ForwardStop”
testing procedures and a stopping pointkˆto control the FDR. Li and Barber (2017) develop a family of “accumulation tests” to choose a cutoffkˆto evaluate the model adequacy and control the FDR. On one side, we would like the model to be simple (parsimony principle), but on the other side, the model should fit the date well (adequacy principle). There is often a trade-off between these two aspects, and the FDR can be used to control the trade-off. A conservative FDR control may induce an oversimplified model. We follow the family of “accumulation tests” and propose a method to control the FDR and maintain a high sensitivity simultaneously for high-dimensional post-selection inference using least angle regression.
The rest of this chapter is organized as follows. Section 2 introduces the inference background and the new method that can control the FDR and sensitivity simultaneously. Numerical studies are reported in Sect.3to show the superb of the proposed method in high-dimensional settings even when strong multicollinearity exists in the predictors. The conclusion and discussion are described in Sect.4.