Linear Discriminant Analysis, Explained

(1)

Linear Discriminant Analysis, Explained

Intuitions, illustrations, and maths: How it’s more than a dimension reduction tool and why it’s robust for real-

Why LDA?

This is really the basic concept of ‘classification’ which is widely used in a wide variety of Data Science fields, especially Machine Learning.

Consider only two dimensions with two distinct clusters. LDA will project these clusters down to one dimension. Imagine it creating separate probability density functions for each class /cluster, then we try to maximize the difference between these (effectively by minimizing the area of ‘overlap’ between them):

From Sebastian Raschka: https://sebastianraschka.com/Articles/2014_python_lda.html

In the example above we have a perfect separation of the blue and green cluster along the x- axis. This means that if future points of data behave according to the proposed probability density functions, then we should be able to perfectly classify them as either blue or green.

(2)

In Linear Discriminant Analysis (LDA), we assume that the two classes have equal covariance matrices:

∂

dx=−f₁(x)π₁+f₂(x)π₂=0 f₁(x)π₁=f₂(x)π₂

1

√

⁽²^π⁾^d

|

^∑¹

|

^exp

−(x−μ₁)^T∑₁⁻¹(x−μ₁)

2 π₁= 1

√

⁽²^π⁾^d

|

^∑¹

|

^exp

−(x−μ₂)^T∑₂⁻¹(x−μ₂)

2 π₂

exp

−(x−μ1)^T∑⁻¹(x−μ1)

2 π₁=exp

−(x−μ2)^T∑⁻¹(x−μ2)

2 π₂

−

(

^x−^μ1

)

^T^∑⁻¹

(

^x−^μ1

)

2 +lnπ₁=−

(

^x−^μ2

)

^T^∑⁻¹

(

^x−^μ2

)

2 +lnπ₂ Selanjutnya kita lihat

(

x−μ₁

)

^T∑⁻¹

(

x−μ₁

)

⁼

(

^x^T⁻^μ1

T

)

^∑⁻¹

(

^x−^μ1

)

¿x^T∑⁻¹x−x^T∑⁻¹μ₁−μ₁^T∑⁻¹x+μ₁^T∑⁻¹μ₁

¿x^T∑⁻¹x−2μ₁^T∑⁻¹x+μ₁^T∑⁻¹μ₁

Dengan cara yang sama diperoleh

(

^x−μ2

)

^T^∑⁻¹

(

^x^−μ2

)

⁼^x^T^∑⁻¹^x−2^μ2

T∑⁻¹x+μ₂^T∑⁻¹μ₂

Sehingga diperoleh 2μ₁^T∑⁻¹x−μ₁^T∑⁻¹μ₁

2 +lnπ₁=2μ₂^T∑⁻¹x−μ₂^T∑⁻¹μ₂ 2 +lnπ₂

δ(x)=2

(

^∑⁻¹

(

^μ2−μ₁

) )

^T^x+

(

^μ1−μ₂

)

^T^∑⁻¹

(

^μ1−μ₂

)

^{+2 ln} ^π²

π₁ The class of an instance x is estimated as

C^(x)=

{

^{1,if ∂}^{2,if ∂}^(x^(x^)<0^)>0

Atau bisa menggunakan fungsi diskriminan

f_i(x_k)=μ_iC⁻¹x_k^T−1

2μ_iC⁻¹μ_i^T+ln(p_i)

(3)

We should assign object k to group i that has maximum fi. Pilih fi yang paling maksimal sebagai keputusan kelasnya.

Berikut merupakan data chip rings yang lulus =1 (passed) dan tidak lulus = 2 (not passed) uji kualitas dari suatu perusahaan yang memproduksi chip ring berkualitas tinggi dengan pertimbangan curvature dan diameter. Data :

class X1 X2

1 2,95 6,63

1 2,53 7,79

1 3,57 5,65

1 3,16 5,47

2 2,58 4,46

2 2,16 6,22

2 3,27 3,52

x = feature seluruh data. Setiap baris merepresentasikan satu kelas; setiap kolom berdiri untuk satu feature. y = indeks kelas (passed = 1; not passed = 2).

Jika diketahui sebuah chip rings memiliki curvature 2.81 dan diameter 5.46. Tentukan kelas quality controlnya.! (Gunakan Konsep LDA)

Data

2.95

X=

[

^6.63^{3.27 3.53}^¿ ^¿^¿

]

^dan ^y=

[

^¿1¹^¿²

]

^,

 Xi = features data for group i, i = 1,2 X₁=

[

^{2.95 6.63}^{2.53 7.79}^{3.57 5.65}^{3.16 5.47}

]

^X²⁼

[

^{2.58 4.46}^{2.16 6.22}^{3.27 3.52}

]

 xk = data of row k. For Example x1 = [2.95 6.63] , x7 = [ 3.27 3.53 ]

 g = number of groups in y. In this case, g = 2.

Untuk μi = mean of features in group i, which is average of xi

 μ1 = [ 3.05 6.38] and μ2 = [ 2.67 4.73]

-- > μ global mean vector, that is mean of the whole data set. In this example μ = [ 2.88 5.676]

 xi0 mean corrected data, dari data feature untuk kelas I, xi minus the global mean vector

(4)

 x₁⁰=

[

⁻^0.679^0.269^0.06^0.357 ^−0.025^−0.209^0.951^2.109

]

^{, and} ^x²⁰⁼

[

⁻⁻^0.386^0.305^0.732 ^−1.218^−2.155^0.547

]

Sehingga didapat matriks covariance dari kelas i dengan menggunakan rumus

 c_i=

(

^xi 0

)

^T^xi

0

n_i covariance matrix of group i

 c₁=

[

⁻^0.166^0.192 ^−0.192^1.349

]

^and ^c²⁼

[

⁻^0.259^0.286 ⁻^2.142^0.286

]

 menghitung pooled within group covariance matrix (matriks covariance dalam kelompok) dengan menggunakan rumus C = pooled within group covariance matrix.

 C(r , s)=1 n

∑

i=1 g

n_ic_i(r , s)

 C=

[

⁴⁷^−0.192⁴⁷^0.166⁺⁺³⁷³⁷^−0.286^0.259 ⁴⁷^−0.192+⁴⁷^1.349+³⁷³⁷^−0.286^2.142

]

⁼

[

⁻^0.206^0.233 ^−0.233^1.689

]

 C⁻¹=

[

5.745 0.791 0.791 0.701

]

Covariance matriks inilah yang akan digunakan untuk setiap kelas dalam pembentukan fungsi diskriminan untuk klasifikasi (membentuk model)

menghitung probabilitas kelas ke-i, P = prior probability vector. If we don’t know the prior

probability, we just assume it is equal to p_i=n_i

N  P=

[

⁷³⁷⁴

]

Kemudian data-data chip rings setiap baris yang ditranspose terlebih dahulu x_k^T dimasukkan dalam perhitungan tersebut. Untuk menentukan hasil quality control lulus uji atau tidak yaitu dengan melihat hasil dari perhitungan �1 dan �2 yang memiliki nilai maksimum.

For x1 = [2.95 6.63] we should assign object k to group i that has maximum fi

f_i(x_k)=μ_iC⁻¹x_k^T−1

2μ_iC⁻¹μ_i^T+ln(p_i)

Dipunyai fungsi diskriminant sebagai berikut :

f₁(x_k)=[3.05 6.38]

[

5.745 0.791

0.791 0.701

] [

^x^k^T⁻¹²

[

^3.05^6.38

] ]

⁺^ln⁷⁴

(5)

f₂(x_k)=[2 .47 6.73]

[

5.745 0.791

0.791 0.701

] [

^x^k^T⁻¹²

[

^2.47^6.73

] ]

⁺^ln³⁷

f₁(x₁)=[3.05 6.38]

[

5.745 0.791

0.791 0.701

] [ [

^2.95^6.63

]

⁻¹²

[

^3.05^6.38

] ]

⁺^ln⁴⁷

f₂(x₁)=[2.47 6.73]

[

5.745 0.791

0.791 0.701

] [ [

^2.53^7.79

]

⁻¹²

[

^2.47^6.73

] ]

⁺^ln³⁷

Hasil perhitungan klasifikasi

Training Data, D Discriminant function Results

class X1 X2 f1 f2

Classificatio n

1 2,95 6,63 55,220 53,071 1

1 2,53 7,79 53,774 51,394 1

1 3,57 5,65 62,476 59,589 1

1 3,16 5,47 51,953 50,764 1

2 2,58 4,46 32,028 34,313 2

2 2,16 6,22 34,554 35,757 2

2 3,27 3,52 41,174 42,414 2

predictio

n 2,81 5,46 44,049 44,085 2

(6)

Examples :

https://real-statistics.com/multivariate-

statistics/discriminant-analysis/linear-discriminant-analysis/

Example 1: Perform discriminant analysis on the data in

Example 1 of MANOVA Basic Concepts. This data is repeated

in Figure 1 (in two columns for easier readability). Also

determine in which category to put the vector X with yield 60,

water 25 and herbicide 6.

(7)

library(klaR) library(psych) library(MASS) library(devtools) data("iris")

str(iris)

First will create a scatterplot for the first four numerical variables. The gap between the points given is zero.

pairs.panels(iris[1:4], gap = 0,

bg = c("red", "green", "blue") [iris$Species],

pch = 21)

(8)

Let’s create a training dataset and test dataset for prediction and testing purposes. 60%

dataset used for training purposes and 40$ used for testing purposes.

set.seed(123)

ind <- sample(2, nrow(iris), replace = TRUE,

prob = c(0.99, 0.01)) training <- iris[ind==1,]

testing <- iris[ind==2,]

linear <- lda(Species~., training) linear

Call:

lda(Species ~ ., data = training) Prior probabilities of groups:

setosa versicolor virginica 0.3288591 0.3355705 0.3355705 Group means:

Sepal.Length Sepal.Width Petal.Length Petal.Width setosa 5.004082 3.430612 1.457143 0.2408163 versicolor 5.936000 2.770000 4.260000 1.3260000 virginica 6.588000 2.974000 5.552000 2.0260000

Coefficients of linear discriminants:

LD1 LD2

Sepal.Length 0.8145327 0.0188473 Sepal.Width 1.5593494 2.1672295 Petal.Length -2.1751998 -0.9203999 Petal.Width -2.8741576 2.8124736

Coefficients of linear discriminants: Shows the linear combination of predictor variables that are used to form the LDA decision rule. for example,

LD1 = 0.8145327*Sepal.Length + 1.5593494*Sepal.Width - -2.1751998*Petal.Length - -2.8741576*Petal.Width.

Similarly,

(9)

LD2 = 0.0188473*Sepal.Length + 2.1672295*Sepal.Width - 0.9203999*Petal.Length + 2.8124736*Petal.Width.

Proportion of trace:

LD1 LD2 0.9913 0.0087

Based on the training dataset, 38% belongs to setosa group, 31% belongs to versicolor groups and 30% belongs to virginica groups. Percentage separations achieved by the first discriminant function is 99.13% and second is 0.87%