Linear Discriminant Analysis, Explained
Intuitions, illustrations, and maths: How it’s more than a dimension reduction tool and why it’s robust for real-
Why LDA?
This is really the basic concept of ‘classification’ which is widely used in a wide variety of Data Science fields, especially Machine Learning.
Consider only two dimensions with two distinct clusters. LDA will project these clusters down to one dimension. Imagine it creating separate probability density functions for each class /cluster, then we try to maximize the difference between these (effectively by minimizing the area of ‘overlap’ between them):
From Sebastian Raschka: https://sebastianraschka.com/Articles/2014_python_lda.html
In the example above we have a perfect separation of the blue and green cluster along the x- axis. This means that if future points of data behave according to the proposed probability density functions, then we should be able to perfectly classify them as either blue or green.
In Linear Discriminant Analysis (LDA), we assume that the two classes have equal covariance matrices:
∂
dx=−f1(x)π1+f2(x)π2=0 f1(x)π1=f2(x)π2
1
√
(2π)d|
∑1|
exp−(x−μ1)T∑1−1(x−μ1)
2 π1= 1
√
(2π)d|
∑1|
exp−(x−μ2)T∑2−1(x−μ2)
2 π2
exp
−(x−μ1)T∑−1(x−μ1)
2 π1=exp
−(x−μ2)T∑−1(x−μ2)
2 π2
−
(
x−μ1)
T∑−1(
x−μ1)
2 +lnπ1=−
(
x−μ2)
T∑−1(
x−μ2)
2 +lnπ2 Selanjutnya kita lihat
(
x−μ1)
T∑−1(
x−μ1)
=(
xT−μ1T
)
∑−1(
x−μ1)
¿xT∑−1x−xT∑−1μ1−μ1T∑−1x+μ1T∑−1μ1
¿xT∑−1x−2μ1T∑−1x+μ1T∑−1μ1
Dengan cara yang sama diperoleh
(
x−μ2)
T∑−1(
x−μ2)
=xT∑−1x−2μ2T∑−1x+μ2T∑−1μ2
Sehingga diperoleh 2μ1T∑−1x−μ1T∑−1μ1
2 +lnπ1=2μ2T∑−1x−μ2T∑−1μ2 2 +lnπ2
δ(x)=2
(
∑−1(
μ2−μ1) )
Tx+(
μ1−μ2)
T∑−1(
μ1−μ2)
+2 ln π2π1 The class of an instance x is estimated as
C^(x)=
{
1,if ∂2,if ∂(x(x)<0)>0Atau bisa menggunakan fungsi diskriminan
fi(xk)=μiC−1xkT−1
2μiC−1μiT+ln(pi)
We should assign object k to group i that has maximum fi. Pilih fi yang paling maksimal sebagai keputusan kelasnya.
Berikut merupakan data chip rings yang lulus =1 (passed) dan tidak lulus = 2 (not passed) uji kualitas dari suatu perusahaan yang memproduksi chip ring berkualitas tinggi dengan pertimbangan curvature dan diameter. Data :
class X1 X2
1 2,95 6,63
1 2,53 7,79
1 3,57 5,65
1 3,16 5,47
2 2,58 4,46
2 2,16 6,22
2 3,27 3,52
x = feature seluruh data. Setiap baris merepresentasikan satu kelas; setiap kolom berdiri untuk satu feature. y = indeks kelas (passed = 1; not passed = 2).
Jika diketahui sebuah chip rings memiliki curvature 2.81 dan diameter 5.46. Tentukan kelas quality controlnya.! (Gunakan Konsep LDA)
Data
2.95
X=
[
6.633.27 3.53¿ ¿¿]
dan y=[
¿11¿2]
, Xi = features data for group i, i = 1,2 X1=
[
2.95 6.632.53 7.793.57 5.653.16 5.47]
X2=[
2.58 4.462.16 6.223.27 3.52]
xk = data of row k. For Example x1 = [2.95 6.63] , x7 = [ 3.27 3.53 ]
g = number of groups in y. In this case, g = 2.
Untuk μi = mean of features in group i, which is average of xi
μ1 = [ 3.05 6.38] and μ2 = [ 2.67 4.73]
-- > μ global mean vector, that is mean of the whole data set. In this example μ = [ 2.88 5.676]
xi0 mean corrected data, dari data feature untuk kelas I, xi minus the global mean vector
x10=
[
−0.6790.2690.060.357 −0.025−0.2090.9512.109]
, and x20=[
−−0.3860.3050.732 −1.218−2.1550.547]
Sehingga didapat matriks covariance dari kelas i dengan menggunakan rumus
ci=
(
xi 0)
Txi0
ni covariance matrix of group i
c1=
[
−0.1660.192 −0.1921.349]
and c2=[
−0.2590.286 −2.1420.286]
menghitung pooled within group covariance matrix (matriks covariance dalam kelompok) dengan menggunakan rumus C = pooled within group covariance matrix.
C(r , s)=1 n
∑
i=1 g
nici(r , s)
C=
[
47−0.192470.166++3737−0.2860.259 47−0.192+471.349+3737−0.2862.142]
=[
−0.2060.233 −0.2331.689]
C−1=
[
5.745 0.791 0.791 0.701]
Covariance matriks inilah yang akan digunakan untuk setiap kelas dalam pembentukan fungsi diskriminan untuk klasifikasi (membentuk model)
menghitung probabilitas kelas ke-i, P = prior probability vector. If we don’t know the prior
probability, we just assume it is equal to pi=ni
N P=
[
7374]
Kemudian data-data chip rings setiap baris yang ditranspose terlebih dahulu xkT dimasukkan dalam perhitungan tersebut. Untuk menentukan hasil quality control lulus uji atau tidak yaitu dengan melihat hasil dari perhitungan �1 dan �2 yang memiliki nilai maksimum.
For x1 = [2.95 6.63] we should assign object k to group i that has maximum fi
fi(xk)=μiC−1xkT−1
2μiC−1μiT+ln(pi)
Dipunyai fungsi diskriminant sebagai berikut :
f1(xk)=[3.05 6.38]
[
5.745 0.7910.791 0.701
] [
xkT−12[
3.056.38] ]
+ln74f2(xk)=[2 .47 6.73]
[
5.745 0.7910.791 0.701
] [
xkT−12[
2.476.73] ]
+ln37f1(x1)=[3.05 6.38]
[
5.745 0.7910.791 0.701
] [ [
2.956.63]
−12[
3.056.38] ]
+ln47f2(x1)=[2.47 6.73]
[
5.745 0.7910.791 0.701
] [ [
2.537.79]
−12[
2.476.73] ]
+ln37Hasil perhitungan klasifikasi
Training Data, D Discriminant function Results
class X1 X2 f1 f2
Classificatio n
1 2,95 6,63 55,220 53,071 1
1 2,53 7,79 53,774 51,394 1
1 3,57 5,65 62,476 59,589 1
1 3,16 5,47 51,953 50,764 1
2 2,58 4,46 32,028 34,313 2
2 2,16 6,22 34,554 35,757 2
2 3,27 3,52 41,174 42,414 2
predictio
n 2,81 5,46 44,049 44,085 2
Examples :
https://real-statistics.com/multivariate-statistics/discriminant-analysis/linear-discriminant-analysis/
Example 1: Perform discriminant analysis on the data in
Example 1 of MANOVA Basic Concepts. This data is repeated
in Figure 1 (in two columns for easier readability). Also
determine in which category to put the vector X with yield 60,
water 25 and herbicide 6.
library(klaR) library(psych) library(MASS) library(devtools) data("iris")
str(iris)
First will create a scatterplot for the first four numerical variables. The gap between the points given is zero.
pairs.panels(iris[1:4], gap = 0,
bg = c("red", "green", "blue") [iris$Species],
pch = 21)
Let’s create a training dataset and test dataset for prediction and testing purposes. 60%
dataset used for training purposes and 40$ used for testing purposes.
set.seed(123)
ind <- sample(2, nrow(iris), replace = TRUE,
prob = c(0.99, 0.01)) training <- iris[ind==1,]
testing <- iris[ind==2,]
linear <- lda(Species~., training) linear
Call:
lda(Species ~ ., data = training) Prior probabilities of groups:
setosa versicolor virginica 0.3288591 0.3355705 0.3355705 Group means:
Sepal.Length Sepal.Width Petal.Length Petal.Width setosa 5.004082 3.430612 1.457143 0.2408163 versicolor 5.936000 2.770000 4.260000 1.3260000 virginica 6.588000 2.974000 5.552000 2.0260000
Coefficients of linear discriminants:
LD1 LD2
Sepal.Length 0.8145327 0.0188473 Sepal.Width 1.5593494 2.1672295 Petal.Length -2.1751998 -0.9203999 Petal.Width -2.8741576 2.8124736
Coefficients of linear discriminants: Shows the linear combination of predictor variables that are used to form the LDA decision rule. for example,
LD1 = 0.8145327*Sepal.Length + 1.5593494*Sepal.Width - -2.1751998*Petal.Length - -2.8741576*Petal.Width.
Similarly,
LD2 = 0.0188473*Sepal.Length + 2.1672295*Sepal.Width - 0.9203999*Petal.Length + 2.8124736*Petal.Width.
Proportion of trace:
LD1 LD2 0.9913 0.0087
Based on the training dataset, 38% belongs to setosa group, 31% belongs to versicolor groups and 30% belongs to virginica groups. Percentage separations achieved by the first discriminant function is 99.13% and second is 0.87%