Cut-off point of linear discriminant rule for large dimension
Takayuki Yamada1, Tetsuto Himeno2and Tetsuro Sakurai3
1 General Studies, College of Engineering, Nihon University,
1 Nakagawara, Tokusada, Tamuramachi, Koriyama, Fukushima 963-8642, Japan
2Department of Computer and Information Science, Faculty of Science and Technology, Seikei University,
3-3-1 Kichijoji-Kitamachi, Musashino-shi, Tokyo 180-8633, Japan
3Center of General Education, Tokyo University of Science, Suwa,
5000-1, Toyohira, Chino-shi, Nagano, 391-0292, Japan
Abstract
This paper is concerned with the problem of classifying a observation vector into one of two pop- ulations Π1:Np(µ1,Σ) and Π2:Np(µ2,Σ). Anderson (1973, Ann. Statist.) gave an asymptotic expansion of the studentized statistic, and derived cut-off point to achieve a specified probability of misclassification. But the dimensionpgets large, the precision becomes worse. So in this paper, we proposed studentized statistic in terms of (n, p) asymptotic. An asymptotic expansion of the statis- tic is derived up to the orderO1, whereO1is a term with respect to{p−1/2, N1−1/2, N2−1/2, m−1/2} for each sample sizeNi andm=N1+N2−2−p. Using the expansion, we gave cut-off point to achieve a specified probability of misclassification.
Keywords: Linear discriminant rule, cut-off point, (n, p) asymptotic AMS 2010 subject classification: Primary 62H30, Secondary 62E20
1 Introduction
This paper is concerned with the problem of classifying a observation vector into one of two populations Π1 :Np(µ1,Σ) and Π2 :Np(µ2,Σ). The observation x is classified as coming from either Π1 or Π2 based on the samples
xi1, . . . ,xiNi ∼Np(µi,Σ) (i= 1,2),
which are independent. For this problem, linear discriminant analysis is used. Let W = (¯x1−x¯2)′S−1
{ x−1
2(¯x1+ ¯x2) }
,
where ¯x1, ¯x2andS are the sample mean vectors and the pooled sample covariance matrix defined by
¯ xi= 1
Ni Ni
∑
j=1
xij, i= 1,2, S= 1 n
∑2
i=1 Ni
∑
j=1
(xij−x¯i)(xij−x¯i)′, n=N−2 =N1+N2−2.
Linear discriminant rule classifiesxas Π1ifW(x)> cand Π2ifW(x)< cfor a constantc. Inference concerning linear discriminant analysis is studied under large sample asymptotic framework A0:
A0 :N1→ ∞, N2→ ∞, N1/N2→c∈(0,∞).
1E-mail address: [email protected]
2E-mail address: [email protected]
3E-mail address: [email protected]
For a review of results under A0, see, e.g., Fujikoshi et al. [4]. When p becomes large, accuracy of results proposed using A0 gets worse. As a way to improve the poorness, it is studied under the asymptotic framework A1:
A1 :p→ ∞, N1→ ∞, N2→ ∞, p/N1→γ1∈(0,1), p/N2→γ2∈(0,1) and N1/N2→γ∈(0,∞).
As results under A1, Fujikoshi and Seo [3] gave an asymptotic approximation of the probabilities of misclassification, Fujikoshi [2] gave its error bound. These results are reviewed in Fujikoshi et al. [4].
Following Lachenbruch [5], forx∈Πi, W = (¯x1−x¯2)′S−1
{ x−1
2(¯x1+ ¯x2) }
=V1/2Zi+ (−1)iUi, (1) where
V = (¯x1−x¯2)′S−1ΣS−1(¯x1−x¯2), Zi=V−1/2(¯x1−x¯2)′S−1(x−µi), Ui= (−1)i+1(¯x1−x¯2)′S−1(¯xi−µi)−1
2D2,
and D2 is the squared sample Maharanobius distance defined by D2 = (¯x1 −x¯2)′S−1(¯x1−x¯2).
From the normality of x, Zi is distributed as the standard normal distribution, which we denote it as Zi ∼ N(0,1). Since it does not depend on {x¯1,x¯2,S}, Zi is independent to the set, and so Zi
is independent from {Ui, V}. The limiting distribution of W under the asymptotic framework A1 is normal with meanui,0= (−1)ilimA1E[Ui] and variancev0= limA1{E[V] + Var(Ui)}ifx∈Np(µi,Σ).
Under the assumption that the Mahalanobis distance ∆ =
√
(µ1−µ2)′Σ−1(µ1−µ2) converges to a positive constant under A1, Fujikoshi and Seo [3] and Fujikoshi [2] showed that Var(Ui)→0. So, we can abbreviate asv0= limA1E[V].
One may want to determine the cut-off point c to adjust the probabilities of misclassification.
Results under A0 is written in Anderson [1]. On the other hand, from Fujikoshi [2], under the as- sumption thatx∈Πi, the limiting distribution (W −ui)/√
v isN(0,1), whereui andv are constants such that limA1(ui−ui,0) = 0 and limA1(v−v0) = 0. Using this result, we find that the approxi- mation of the misclassification probability that xis allocated to Πj even thoughx∈ Πi is given as Φ((−1)i−1(c−ui)/√
v) fori, j = 1,2 with i̸= j, where Φ(.) is the cumulative distribution function of the standard normal distribution. Since ui and v contain ∆2, which need to be estimated. The unbiased estimator is given as
∆c2=n−p−1
n (¯x1−x¯2)′S−1(¯x1−x¯2)− pN N1N2
.
Consistency under the asymptotic framework A1 holds. One can choose c from the fact that the limiting distribution of (W−ubi)/√
ˆ
visN(0,1) ifx∈Np(µi,Σ), where (ubi,v) is (uˆ i, v) with replacing
∆2 by∆c2.
This paper is organized as follows. In Section 2, we give an asymptotic expansion of the (W−ubi)/√ ˆ v for the case thatx∈Πi. We propose a cut-off point which the misclassification probability becomes presetting level, asymptotically. Section 3 presents simulation results for misclassification probability.
Proof of lemma and derivation of expectations are given in Appendix.
2 Asymptotic expansion under A1
For technical reason, set
ui= n 2(m−1)
{
(−1)i+1∆2+ ( p
N2 − p N1
)}
, v= n2(n+ 1)
(m−1)(m+ 1)(m+ 2) (
∆2+ N p N1N2
) ,
where m =n−p. Note that ui = (−1)iE[Ui], but v equals E[V], asymptotically, under A1. Then limA1ui=ui,0 and limA1v=v0. Using unbiased estimator of (ui, v), we have
P (
W√−cu1
ˆ v < x
x∈Π1
)
=E [
Φ (√
ˆ
vx+√U1+cu1
V
)]
,
P (
W−cu2
√vˆ > x x∈Π2
)
=E [
Φ (−√
ˆ
vx+U2−uc2
√V
)]
. Let
u1= ( 1
N1 + 1 N2
)−1/2
Σ−1/2(¯x1−x¯2), u2= 1
√NΣ−1/2(N1x¯1+N2x¯2−N1µ1−N2µ2), B=Σ−1/2SΣ−1/2,
where N = N1 +N2. Then u1, u2 and B are independent. In addition, u1 ∼ Np((1/N1 + 1/N2)−1/2δ,Ip) and u2 ∼ Np(0,Ip), where δ = Σ−1/2(µ1 −µ2). It also holds that nB is dis- tributed as a Wishart distribution with degrees of freedomn=N−2 and covariance matrixIp, which is denoted asWp(n,Ip). Substituting them,
Ui=−(−1)i+1 2
( p N2 − p
N1
)u′1B−1u1
p +(−1)i+1p
√N1N2
u′1B−1u2
p −τi
δ′B−1u1
√p , V = N p
N1N2
u′1B−2u1
p , τi=
√
pN3/2+(−1)i+1/2
N N3/2−(−1)i+1/2
. In addition,
∆c2= N p N1N2
{m−1 n
u′1B−1u1 p −1
} . Then,
b
ui= (−1)i+1 n 2(m−1)
{ N p N1N2
m−1 n
u′1B−1u1
p − N p
N1N2 + (−1)i+1 ( p
N2 − p N1
)}
, ˆ
v= n(n+ 1) (m+ 1)(m+ 2)
N p N1N2
u′1B−1u1
p .
The following lemma gives that these random variables can be expressed as functions of the inde- pendent standard normal and chi-squared variables, simultaneously.
Lemma 1. Let v1∼Np(δ,Ip),v2∼Np(0,Ip),A∼Wp(n,Ip), andv1,v2 andAare independent.
Then the following equalities in distribution hold, simultaneously:
S ≡δ′A−1v1=D ∆ Y1
(
Z1+ ∆−
√Y2
Y3
Z2
) ,
T ≡v′2A−1v1=D
√ 1 Y12
( 1 + Y2
Y3
)
{(Z1+ ∆)2+Z22+Y4}Z3 U ≡v′1A−1v1=D 1
Y1{(Z1+ ∆)2+Z22+Y4}, V ≡v′1A−2v1=D 1
Y12 (
1 +Y2
Y3
)
{(Z1+ ∆)2+Z22+Y4},
where ∆ = √
δ′δ; Z1, Z2, Z3, Y1, . . . , Y4 are independent, Zi ∼ N(0,1), i = 1,2,3, Yi ∼ χ2f
i, i = 1, . . . ,4,
f1=n−p+ 1, f2=p−1, f3=n−p+ 2, f4=p−2.
Note that Fujikoshi and Seo [3] has also given similar results, but their results are individual ones, so cannot treat simultaneously. The proof of Lemma 1 is given in Appendix.
It can be described that (−1)i+1√
b
vx+Ui−(−1)iubi
= (−1)i+1
√
n(n+ 1) (m+ 1)(m+ 2)ω−1
√ Q1
p x−(−1)i+1 2
( p N2 − p
N1 )Q1
p +(−1)i+1p
√N1N2
B2
p −τi
B1
√p +ω−2
2 Q1
p − n
2(m−1)ω−2+(−1)i+1 2
n m−1
( p N2 − p
N1 )
, (2)
V =ω−2Q2
p , (3)
whereQ1=u′1B−1u1,Q2=u′1B−2u1,B1=δ′B−1u1andB2=u′2B−1u1,ω2=N1N2/(N p). From Lemma 1, we have
Q1
p
=D n f1
1 1 +√
2/f1W1
S, B1
√p
=D n f1
∆ 1 +√
2/f1W1
( Z1
√p+ω∆−
√ f2
f3
√TZ2
√p )
, B2
p
=D n f1
1 1 +√
2/f1W1
√(
1 +f2 f3
T )
SZ3
√p, Q2
p
=D n2 f12
1 (1 +√
2/f1W1)2 (
1 +f2
f3T )
S, whereWi=√
fi/2(Yi/fi−1) fori= 1, . . . ,4, S=
(Z1
√p+ω∆
)2
+ (Z2
√p )2
+p−2 p
( 1 +
√2 f4
W4
) , T =1 +√
2/f2W2
1 +√ 2/f3W3
.
SortingS in descending order, it can be expressed as
S=s0+S1/2+S1+O3/2, where
s0= 1 +ω2∆2, S1/2= 2ω∆
√p Z1+
√2 f4W4, S1= Z12
p +Z22 p −2
p,
andOj/2is a term ofj-th order with respect to{p−1/2, N1−1/2, N2−1/2, m−1/2}. By Maclaurin expansion of (1 +√
2/f3W3)−1 up to the term with order off3−1, T =
( 1 +
√2 f2
W2
) ( 1−
√2 f3
W3+ 2 f3
W32 )
+O3/2,
which can be sorted in descending order as
T = 1 +T1/2+T1+O3/2
with
T1/2=
√2 f2W2−
√2 f3W3, T1= 2
f3
W32− 2
√f2f3
W2W3. Doing Maclaurin expansion of (1 +√
2/f1W1)−1 in Q1/pup to the term with order off1−1, and sort it in descending order, it is written as
Q1
p =q1,0+Q1,1/2+Q1,1+O3/2, where
q1,0= n f1s0, Q1,1/2= n
f1
( S1/2−
√2 f1
s0W1 )
, Q1,1= n
f1
( S1−
√2 f1
S1/2W1+ 2 f1
s0W12 )
, and using this expansion,
√ Q1
p =√q1,0
{ 1 + 1
2q1,0(Q1,1/2+Q1,1)− 1
8q1,02 Q21,1/2 }
+O3/2. Using similar way, we can expressQ2/pas
Q2
p =q2,0+Q2,1/2+Q2,1+O3/2, where
q2,0= (n
f1
)2( 1 +f2
f3
) s0, Q2,1/2=
(n f1
)2[{(
1 + f2 f3
)
S1/2+f2 f3
s0T1/2 }
−2
√2 f1
( 1 +f2
f3
) s0W1
] , Q2,1=
(n f1
)2[{(
1 + f2
f3 )
S1+f2
f3S1/2T1/2+f2
f3s0T1 }
−2
√2 f1
{(
1 + f2
f3 )
S1/2+f2
f3s0T1/2 }
W1 +6
f1 (
1 + f2
f3 )
s0W12 ]
. In addition, it is also expanded that
B1
√p =b1,0+B1,1/2+B1,1+O3/2, where
b1,0= n f1ω∆2, B1,1/2= n
f1
{(
Z1
√p−
√ f2 f3
Z2
√p )
∆−
√2 f1
ω∆2W1 }
,
B1,1= n f1
{
−∆ 2
√ f2
f3 Z2
√pT1/2−
√2 f1
( Z1
√p−
√ f2
f3 Z2
√p )
∆W1+ 2
f1ω∆2W12 }
.
Sorting in descending order, it can be described that (
1 + f2
f3T )
S=s0
( 1 +f2
f3 )
(1 + ˜T1/2+ ˜S1/2) +O1, where ˜S1/2=S1/2/s0 and ˜T1/2={(f2/f3)/(1 +f2/f3)}T1/2, and so
√(
1 +f2 f3
T )
S=
√ s0
( 1 + f2
f3
) ( 1 + 1
2( ˜T1/2+ ˜S1/2) )
+O1. Substituting this expansion intoB2/p, and using Maclaurin expansion of (1 +√
2/f1W1)−1 up to the term with order off1−1/2, we have
B2
p =B2,1/2+B2,1+O3/2, where
B2,1/2= n f1
√(
1 + f2 f3
) s0
Z3
√p,
B2,1= n f1
√(
1 +f2
f3 )
s0
{1
2( ˜T1/2+ ˜S1/2)−
√2 f1W1
} Z3
√p.
Substituting these expansions in (2) and (3), and coordinating it in order, (2) and (3) can be represented as the sum of terms with descending order, respectively, which are
(−1)i+1√ ˆ
vx+Ui−(−1)iubi= (−1)i+1
√
n2(n+ 1)
(m+ 1)2(m+ 2)s0ω−1x+Ui,1/2+Ui,1+O3/2, V =ω−2 n2(n+ 1)
(m+ 1)2(m+ 2)s0+ω−2Q2,1/2+ω−2Q2,1+O3/2, where
Ui,1/2=AiQ1,1/2−τiB1,1/2+(−1)i+1p
√N1N2
B2,1/2, Ui,1=AiQ1,1−(−1)i+1
√
n(n+ 1) (m+ 1)(m+ 2)
ω−1x 8q3/21,0
Q21,1/2+(−1)i+1p
√N1N2
B2,1−τiB1,1
+ n
(m−1)(m+ 1) [
(−1)i+1 ( p
N2 − p N1
)
−ω−2 ]
, Ai= (−1)i+1
√
n(n+ 1) (m+ 1)(m+ 2)
ω−1
2√q1,0x−(−1)i+1 2
( p N2 − p
N1 )
+ω−2 2 . These expansions lead that
Ri= (−1)i+1√ ˆ
vx+√Ui+ (−1)i+1ubi
V = (−1)i+1x+Ri,1/2+Ri,1+O3/2,
where
Ri,1/2=−1
2(−1)i+1xQ˜2,1/2+ ˜Ui,1/2, Ri,1= (−1)i+1x
(3 8
Q˜22,1/2−1 2
Q˜2,1
)
−1 2
U˜i,1/2Q˜2,1/2+ ˜Ui,1
with
U˜i,j/2=Ui,j/2 / {√
n2(n+ 1)
(m+ 1)2(m+ 2)s0ω−1 }
, Q˜i,j/2=Qi,j/2
/ { n2(n+ 1) (m+ 1)2(m+ 2)s0
} . By Taylor expansion,
Φ(Ri) = Φ((−1)i+1x) +ϕ((−1)i+1x)[
Ri,1/2+Ri,1
]−(−1)i+1x
2 ϕ((−1)i+1x)R2i,1/2+O3/2. SinceRi,1/2 is represented as the linear combination of{Z1, Z2, Z3, W1, . . . , W4}, E[Ri,1/2] = 0. As a result, we give the following theorem.
Theorem 1. Let
˜
xi=x−{
(−1)i+1E[R\i,1]−x
2E[R\2i,1/2] }
, whereE[R\ki,j] isE[Rki,j] with replacing∆2 by ∆c2. Fori= 1,2,
P (
(−1)i+1W−ubi
√vˆ <(−1)i+1x˜i
x∈Πi
)
= Φ((−1)i+1x) +O3/2. Explicit formula ofE[Ri,1] andE[R2i,1/2] can be derived, which are given in Appendix.
Based on the expansion, we set cutoff pointci as ci=√
ˆ v [
(−1)i+1zα− {
(−1)i+1E[R\i,1]−(−1)i+1zα
2 E[R\2i,1/2] }]
+ ˆui (i= 1,2),
wherezαis theαpercentile point of the standard normal distribution. The cutoff pointc1(c2) makes the desired misclassification probability to be α within the error O3/2. The other misclassification probabilities can be described as
P(W > c1|x∈Π2) =E [
Φ
(−(c1−cu2) +U2−uc2
√V
)]
, P(W < c2|x∈Π1) =E
[ Φ
((c2−cu1) +U1+cu1
√V
)]
, respectively. Note that
ˆ
v/V →p 1, U1+ ˆu1
→p 0, U2−uˆ2
→p 0, E[Ri,j/2] =Oj/2 (i, j= 1,2).
From the expansion, we have c
u1−cu2= n m−1
{ N p N1N2
m−1 n
u′1B−1u1
p − N p N1N2
}
=ω−2(1 +ω2∆2) n
m+ 1 − n
m−1ω−2+O1/2. It will be found that
c
u1−cu2− n
m∆2→p 0 under the asymptotic framework A1. In addition,
ˆ v− n3
m3ω−2(1 +ω2∆2)→p 0.
Combining these results,
limA1 P(W > c1|x∈Π2) = lim
A1P(W < c2|x∈Π1) = Φ (
z1−α−lim
A1
√m n
∆2
√∆2+ω−2 )
.
A Proof of Lemma 1
Proof of Lemma1. LetΓbe a orthogonal matrix of orderpwhich the first row is proportional toδ′, and letB=ΓAΓ′ andwi =Γvi,i= 1,2. ThenB∼Wp(n,Ip),w1∼Np(∆e1,Ip),w2∼Np(0,Ip) andw1,w2 andB are independent;
S= (Γδ)′(ΓAΓ′)−1(Γv1)= ∆eD ′1B−1w1, T= (Γv2)′(ΓAΓ′)−1(Γv1)=Dw′2B−1w1=D
√
w′1B−2w1Z U = (Γv1)′(ΓAΓ′)−1(Γv1)=Dw′1B−1w1,
V = (Γv1)′(ΓAΓ′)−2(Γv1)=D w′1B−2w1,
where ei denotes fundamental vector with 1 in i-th position, Z ∼ N(0,1), and Z and {B,w1} are independent. By using reflection matrix(Householder matrix)Hbetweene1 and (1/√
w′1w1)w1, S = ∆D √
w′1w1(He1)′(HBH′)−1{H(1/√
w′1w1)w1}= ∆w′1(HBH′)−1e1. Besides,
U =D w′1B−1w1=w′1w1·e′1(HBH′)−1e1, V =D w′1B−2w1=w′1w1·e′1(HBH′)−2e1. Givenw1, C≡HBH′ ∼Wp(n,Ip), soC andw1 are independent. Partition
C=
(c11 c′21 c21 C22
)
and w1= (w11
w21
) . It can be expressed that
S= ∆wD ′1C−1e1= ∆ c11·2
(w11−w′21C−221c21), wherec11·2=c11−c′21C−221c21. In addition,
U =D w′1w1·e′1C−1e1= w′1w1 c11·2
, V =D w′1w1·e′1C−2e1= w′1w1
c211·2 (1 +c′21C−221c21).
It is noted thatx≡C−221/2c21∼Np−1(0,Ip),D≡C22∼Wp−1(n,Ip−1), andxandD are indepen- dent, thusw11,w21,x,D andc11·2 are independent. Using these results, we have
S =D ∆ c11·2
(w11−w′21D−1/2x) and V =D w′11w11
c211·2 (1 +x′D−1x).
LetG be orthogonal matrix of order p−1 which the first row is proportional to x′D−1/2. Givenx and D, y ≡ Gw21 ∼ Np−1(0,Ip−1), and it is found that w11, c11·2, x, D and y are independent.
Partitioningy= (y1y′2)′, we have S=D ∆
c11·2{w11−(Gw21)′(GD−1/2x)}=D ∆ c11·2
(w11−√
x′D−1xy1), U =D w112 + (Gw21)′(Gw21)
c11·2
=D 1 c11·2
(w211+y12+y′2y2), V =D w211+ (Gw21)′(Gw21)
c211·2 (1 +x′D−1x)=D 1
c211·2(1 +x′D−1x)(w211+y21+y′2y2).
These show the conclusion of the lemma.
B Expectations
Firstly, we calculateE[R2i,1/2]. It is that E[R2i,1/2] = x2
4 E[ ˜Q22,1/2] +E[ ˜Ui,1/22 ]−(−1)i+1xE[ ˜Q2,1/2·U˜i,1/2].
SinceS1/2⊥⊥T1/2⊥⊥W1, E[Q22,1/2] = n4
f14 [(
1 + f2
f3 )2
E[S1/22 ] + (f2
f3 )2
s20E[T1/22 ] + 4· 2 f1
( 1 + f2
f3 )2
s20E[W12] ]
. Noting that
S1/22 = 4ω2∆2
p Z12+ 2 f4
W42+ 22ω∆
√p
√2 f4
Z1W4, it holds that
E[S1/22 ] = 4ω2∆2 p + 2
f4. It also holds that
E[T1/22 ] = 2 f2
E[W22] + 2 f3
E[W32] = 2 f2
+ 2 f3
. Thus,
E[Q22,1/2] = n4 f14
[(
1 +f2
f3 )2(
4ω2∆2 p + 2
f4 )
+ (f2
f3 )2
(1 +ω2∆2)2 (2
f2 + 2 f3
)
+8 f1
( 1 + f2
f3
)2
(1 +ω2∆2)2 ]
. For evaluatingE[Ui,1/22 ], note that
Q21,1/2=n2 f12
(
S1/22 + 2
f1s20W12−2
√2
f1s0S1/2W1
) . Thus,
E[Q21,1/2] = n2 f12
[(4ω2∆2 p + 2
f4
) + 2
f1
s20 ]
= n2 f12
[2 f4
+ 2 f1
+ (4
p+ 2 f1
) ω2∆2
] . Moreover, the following equalities hold.
E[B22,1/2] = n2 f12s0
( 1 +f2
f3 )1
p =n2 f12
( 1 + f2
f3 )
(1 +ω2∆2)1 p, E[B21,1/2] = n2
f12 {(1
p+f2 f3
1 p
)
ω∆2+ 2 f1
ω2∆4 }
. From independence,
E[B1,1/2·B2,1/2] =E[B1,1/2]·E[B2,1/2] = 0, E[Q1,1/2·B2,1/2] =E[Q1,1/2]·E[B2,1/2] = 0.