3 Feature Screening for Censored Data with Error-Prone Covariates

Condition (C1) is standard for implementing the kernel estimation, in which the requirement of_∞

−∞u^rK(u)duto be finite forr∈Nis satisfied by commonly used kernel functions listed in Wand & Jones (1995). Condition (C2) is regarded as the optimal bandwidth in the sense of Wand and Jones (1995, Sect. 2.5), and thus, we takehas of the rate ofn⁻^1/5in the following development.

3 Feature Screening for Censored Data with Error-Prone

whereX¯_i^∗_· =_n¹

r=1

X_ir^∗.

Scenario III: Both functional form of f(·) and its associated parameters are unknown, but external validation data are available.

Suppose thatMis the subject set for the main study containing measurements Ti, Ci, δi, X_i^∗

:i∈M

for nsubjects and that V is the subject set for the external validation study containing measurements

Xi, X^∗_i

:i∈V for m subjects, whereMandVdo not overlap. Assume that the main study and the validation study share the same measurement error model (7); this is the so-called transportabilityassumption (e.g., Yi et al.2015).

With the availability of external validation data,f_{(j )}(·)forj =1,· · ·, pand

can be estimated. Fori∈Vandj =1,· · · , p, thejth component ofi is given by_{i(j )} =X^∗_{i(j )}−X_{i(j )}, which is known. Then adopting the estimator (6) with X_{i(j )}replaced by_{i(j )}andnreplaced bymgives an estimate of the probability density functionf_{i(j )}(·)of_{i(j )}:

f_{(j )}(u)= 1 mh

i∈V

u−_{i(j )} h

Thus, the corresponding characteristic function φ_{(j )}(u) is estimated by φ_{(j )}(u) = _∞

−∞exp(iux)f_{(j )}(x)dx. In addition, applying the least squares regression method gives the estimator of:

= 1 m−1

i∈V

(X_i^∗−X_i)(X_i^∗−X_i). (9)

3.2 Feature Screening with Measurement Error Effects Accommodated

In the presence of measurement error in covariates, the method in Sect. 2.3 cannot apply because the estimator (5) cannot be directly calculated due to the unavailability of the X_i. In this subsection, we derive an estimator (5) using the observed surrogateX_i^∗. First, we re-express the probability density functionf_X_{(j )}(x) by the inverse Fouriertransformation, given by

f_X_{(j )}(x)= 1 2π

_∞

−∞exp(−iux) φX(j )(u)du, (10) whereφ_X_{(j )}(u)is the characteristic function ofX_{(j )}.

Forj =1,· · ·, p, letφ_X∗

(j )(u)andφ_{(j )}(u)denote the characteristic functions of X_{(j )}^∗ and(j ), respectively, whereX^∗_{(j )}and(j )are thejth component ofX^∗and, respectively; andX^∗andfollow the same distribution asX_i^∗and_i, respectively.

Then model (7) yields that φ_X∗

(j )(u)=φ_X_{(j )}(u)φ_{(j )}(u), and thus,φ_X_{(j )}(u)= ^φ_φ^X^{(j )}^∗ ^(u)

(j)(u), assumingφ_{(j )}(u)=0. Then (10) becomes fX_{(j )}(x)= 1

2π _∞

−∞exp(−iux)φ_X∗ (j )(u)

φ_{(j )}(u)du. (11) To emphasize that (11) is expressed in terms of the surrogateX^∗_{(j )}, we letfadj,j(x) to replacefX_{(j )}(x)in the left-hand side of (11).

Next, to implement (11), we need to calculate φ_X∗

(j )(u) and φ_{(j )}(u), where φ_{(j )}(u)is derived from the distributionf_{(j )}(·)of_{(j )}, thejth marginal distribution derived fromf(·).

It now remains to calculateφ_X∗

(j )(u), which is given by φ_X∗

(j )(u)= _∞

−∞exp(iux) f_X∗

(j )(x)dx, (12)

where f_X∗

(j )(x) denotes the probability density function of X_{(j )}^∗ . Since X^∗_{(j )} is observable, then the probability density function ofX^∗_{(j )}can be estimated by the kernel estimation, given by

f_X∗

(j )(x)= 1

nh n i=1

$x−X^∗_{i(j )} h

, (13)

wherehandK(·)are described for (6). In our numerical examination, we specify K(u)to be the normal kernel andhcan be estimated by the cross-validation method (e.g., Wand & Jones1995).

Consequently, withf_X∗

(j )(x)in (12) replaced by f_X∗

(j )(x),φ_X∗(u) can be estimated by

φ_X∗

(j )(u)= _∞

−∞exp(iux)f_X∗ (j )(x)dx

= _∞

−∞exp(iux) 1 nh

n i=1

$x−X^∗_{i(j )} h

% dx.

Letz= ^x⁻^X_h^∗^{i(j )}; then applying the change of variables yields φ_X∗

(j )(u)= _∞

−∞

1 n

n i=1

exp

iuX^∗_{(j )}+iuhz

K (z) dz

& _∞

−∞exp(iuhz) K(z)dz '

× (

1 n

n i=1

exp

iuX_{i(j )}^∗ )

. (14)

Combining (11) and (14) gives an estimator of (11):

f_adj,j(x)= 1 2π

_∞

−∞exp(−iux)φ_X∗ (j )(u)

φ_{(j )}(u)du, (15) and thus, an adjusted estimator of the cumulative distribution functionF_X_{(j )}(x)in terms ofX_{(j )}^∗ is

F_adj,j(x)= _x

−∞

f_adj,j(u)du. (16)

Therefore, the functional distance correlation (3) can be estimated using the observed surrogateX^∗_{(j )}together with the outcomeY, given by

ω_j dcorr{ F_adj,j(X_{(j )}^∗ ),F (Y ) }

= dcov{F_adj,j(X_{(j )}^∗ ),F (Y ) }

dcov{ F_adj,j(X^∗_{(j )}),F_adj,j(X^∗_{(j )})}dcov^∗{F (Y ), F (Y ) }

, (17)

where dcov{F_adj,j(X^∗_{(j )}),F (Y ) } is determined by (4) with F_X_{(j )}(x) replaced by (16).

Remark The development here extends the discussion of Chen (2019) who assumed thatf(·)is the probability density function of a normal distribution under Scenarios I, II, and III. With the jth noise term _{(j )} assuming a normal distribution with mean zero and varianceσ²

(j ), we have that the characteristic function is given by φ_{(j )}(u)=exp

−¹₂u²σ²

(j )

, and thus, (15) becomes fadj,j(x)= 1

2π _∞

−∞exp

−iux+1 2uσ²

(j )

φ_X∗ (j )(u)du.

In contrast, if(j )follows at distribution with degrees of freedomv > 1, then the corresponding characteristic function is given by Dreiera & Kotzb (2002):

φ_{(j )}(u)=2^vv^v/2 (v)

_∞

exp

−v^1/2(2x+ |u|)

× {x(x+ |u|)}^(v⁻^1)/2dx, (18)

and substituting (18) into (15) yieldsfadj,j(x).

3.3 Asymptotic Results

To establish theoretical results of the proposed method, we impose the following additional conditions:

(C3) There exists a positive constantw₀such that for all 0< w≤2w0, sup

1max≤j≤pE

exp

w X_{(j )} ²₁

<∞ and E

exp

w Y ²_q

<∞. (C4) The minimum of the functional distance correlations for the active covariates

satisfies

minj∈Iωj≥2cn⁻^ζ for some constantsc >0 and 0≤ζ <1/2.

(C5) There exists a positive constantv₀such that lim

p→∞

minj∈Iω_j−max

j∈I^cω_j

v₀, assuming the limits exists.

(C6) The covariatesX_i^∗fori=1,· · ·, nare bounded.

Condition (C3) is used to examine the boundness of the difference ω_j −ω_j between (3) and its estimator (17). Condition (C4) says that the marginal DC of active covariates cannot be too small, which is similar to Condition 3 of Fan & Lv (2008). Condition (C5) basically requires the signal carried by the active covariates to be stronger than that displayed by inactive covariates for at least a fixed amount if the dimensionpgoes to infinty. This condition was also imposed by other authors (e.g., Cui et al.2015). Condition (C6) indicates the finite boundness of surrogate measurements of the covariates.

Theorem 1 Under regularity conditions (C3) and (C5) and the assumptions of Lemmas1 and2in Appendix A, we have that forcandζ described in Condition (C4), there exists a constantD >0such that

j=max1,···,pω_j −ω_j≥cn⁻^ζ

pexp

−Dn¹⁻^2ζ

. (19)

Moreover,

maxj∈I^cωj≥min

j∈Iωj

exp

−1 4Dnv₀²

, (20)

wherev₀is the constant described in Condition (C5).

Equation (19) in Theorem 1 indicates ω_j is close to its estimate with a large probability. Similar to the discussion in Li et al. (2012) and Chen et al. (2018), (19) shows that the proposed method is able to handle the non-polynomial (NP) dimensionality of order logp = o(n¹⁻^2ζ) for some constant 0 ≤ ζ < 1/2.

Equation (20) in Theorem1ensures that the proposed estimator (17) has the ranking consistency property, similar to that discussed by Cui et al. (2015) and Hao et al.

(2019).

Theorem 2 Suppose that Conditions (C3)–(C4) and the assumptions of Lemmas1 and2in Appendix A hold. Let

j :ωj≥cn⁻^ζ forj =1,· · ·, p

(21) forcandζ described in Condition (C4). Then for a sufficiently largen,Ihas the sure screening property:

P I⊆I

≥1−O

qexp

−Dn¹⁻^2ζ

, whereDandζ are the constants described in Theorem1.

The sure screening property in Theorem2 shows that with a large probability, the true active set is included in the estimated active set. This property is important which is commonly required for any sensible screening procedure (e.g., Fan & Lv 2008; Li et al.2012; Chen et al.2018).

While (21) allows us to establish the sure screening property of the procedure, it does not tell us exactly about the choice of a suitable threshold value becausecand ζ are unknown. In the actual implementation, we often rank the covariates by the values of theωj for j = 1,· · ·, pand then retain, say,q covariates with the first q largestω_j. A common choice ofqisq =*

n logn

, where·stands for the floor function (e.g., Li et al.2012; Cui et al.2015; Yan et al.2017; Chen et al.2018; Chen 2019).

Dalam dokumen (ICSA Book Series in Statistics) Wenqing He, Liqun Wang, Jiahua Chen, Chunfang Devon Lin - Advances and Innovations in Statistics and Data Science-Springer (2022) (Halaman 44-49)