Classification – Probabilistic Discriminative Model with Maximum Likelihood Estimate (MLE) Likelihood Estimate (MLE)

CAUSES NON-ZERO SKEWNESS

3.3 Classification – Probabilistic Discriminative Model with Maximum Likelihood Estimate (MLE) Likelihood Estimate (MLE)

The advantage of the probabilistic generative model is that we can create (generate) synthetic input values, !x, by sampling from the marginal distribution

( )

x . However, the predictive performance may decrease, especially when the Gaussian form, which we used to model the class-conditional densities, does not give a good representation. In this section, we compute the parameter values in a more direct approach by maximization of the likelihood function or the posterior probability density function (PDF). By not modeling the class-conditional densities explicitly, we will have less number of parameters to determine, and this may lead to an increase in predictive performance.

Directly determining the parameters is an example of a probabilistic discriminative approach.

The likelihood function we want to maximize to determine the parameters consists of the conditional distributions introduced earlier:

!p C

( )

_k x . We start with a relabeling of the variables first, and then we simplify as much as possible to avoid clutter in our mathematical expressions. In the previous section, we obtained the functional form of the posterior class probability, conditional on an input vector, as (3.4). Using this definition, let us define

y_k

( )

x ⁼^{p C}

(

^k ^x^^,^w^^k

)

⁼ ^exp

(

^a^k

( )

^x^ ^w^^k

)

exp

(

a_j

( )

x w _j

)

∑

j ^(3.29)

where !a^k are called activations and are given by

!a_k

( )

x w_k ⁼^w^^k^T^x^ ^(3.30) Let us now clarify the terms that involve “_ ”:

In the expression above,

!w_k=

(

w_k0,w_k^T

)

^T^and_!_!_!^x^⁼

( )

^1,x^T ^T, that is, we augment the input vector with a dummy input !!x⁰=1, similar to what we did in the least squares

classification in Appendix C. In order to decrease the clutter in the mathematical notation, let us redefine the parameters (_!w →w and _!x→x) such that

!w_k =

(

w_k0,w_k1,,w_kD

)

^T^and_!_!_!^x⁼

(

^1,x¹^,^x²^,^^,^x^D

)

^T. Then, we consider maximization of the likelihood function to determine the parameters

{ }

w_k directly.

Now, we need the likelihood function. As we mentioned above, it consists of the posterior class probabilities

!p C

( )

_k x if the prior on the !C^k’s is uniform. We will follow the same !!1−of −K coding scheme as we did above for the target vectors: the target vector !!tⁿ associated with the input vector !!xⁿ, which is assigned to class !C^k, will be a unit vector of dimension !!K=3 with each of its elements being zero unless it is the !k^th element, which is one. Then, we obtain the likelihood function as

(

T X,W

)

⁼ ^{p C}

(

^k ^xⁿ^,w^k

)

^t^nk ⁼ _k ^y^nk^t^nk

∏

K n=1

∏

N k=1

∏

K n=1

∏

N ^(3.31)

where the elements !t^nk form the matrix !T whose dimension is !N×K with !N as the number of data points and !K as the number of classes, and !W is formed by !!D+1- dimensional vector !!w^k as its !k^th column, and !X is formed by !!D+1-dimensional vector

!xⁿ^T as its !n^th row. So, !W and !X are matrices with dimensions

( )

D+1 ^×^K^and

!N×

( )

D+1 respectively. We also have

y_nk= y_k

( )

x_n ⁼^{p C}

(

^k ^xⁿ^,w^k

)

⁼ ^exp_exp

⁽

^w_w^k^T^xⁿ

⁾

j Tx_n

( )

∑

j ^∈^{⎡⎣ ⎤⎦}^0,1 ^(3.32)

Before we start evaluating the probabilistic discriminative model from a Bayesian perspective, let us use the maximum likelihood method to find !!W^MLE by maximizing the likelihood function given by (3.31). Note that the value we will find is in fact the

Bayesian maximum a posteriori (MAP) value but we are taking a flat (non-informative) prior for !W, so !MAP≡MLE. I solved this optimization problem by an algorithm provided by Matlab.

We maximized (3.31) with respect to !W separately for acceleration, velocity, and displacement input and obtained the following confusion matrices by assigning an input vector !x to class !C^k, where

!p C

(

_k x,w_k

)

is a maximum over !!k=1,2,3. Similar to the previous confusion matrices, we show the predictive performance of our models by cross validation; we divide the data sets into two: training data set and validation data set. Then we swap these data sets and average the predictive performances in the form of confusion matrices.

Let us start with acceleration results. The maximum likelihood estimate (MLE) of the parameter matrix,

!W_MLE

acceleration, computed using the entire acceleration data set for

training, is given by

MLE acceleration

= 2.899

&0.003 0.002

&0.002

&0.008

&1.263 0.018 0.013

&0.118

&0.067

&1.606 0.003

&0.005 0.075 0.117

⎡

⎣

⎢⎢

⎤

⎦

⎥⎥

(3.33)

Table 3.4: Confusion matrix for probabilistic discriminative classification with MLE using acceleration data.

ALL DATA PREDICTED CLASSES

ACTUAL CLASSES Acceleration Okay Over Under

Okay 95.2 2 2.8

Over 16.8 83.2 0

Under 21.2 0 78.8

FIRST HALF PREDICTED CLASSES

ACTUAL CLASSES Acceleration Okay Over Under

Okay 88.4 1.4 10.2

Over 12 88 0

Under 9.6 0 90.4

SECOND HALF PREDICTED CLASSES

ACTUAL CLASSES Acceleration Okay Over Under

Okay 97.2 1.8 1

Over 37.6 62.4 0

Under 39.2 0 60.8

AVERAGE OF CROSS VALIDATIONS

PREDICTED CLASSES

ACTUAL CLASSES Acceleration Okay Over Under

Okay 92.8 1.6 5.6

Over 24.8 75.2 0

Under 24.4 0 75.6

!W_MLE

velocity computed using the entire velocity data set for training is given by

WMLE velocity

= 2.743 '0.001 0.001 0.022 '0.024

'1.150 0.028 0.011 '0.184 '0.039

'1.563 0.003 '0.006 0.114 0.091

⎡

⎣

⎢⎢

⎤

⎦

⎥⎥

(3.34)

Table 3.5: Confusion matrix for probabilistic discriminative classification with MLE using velocity data.

ALL DATA PREDICTED CLASSES

ACTUAL CLASSES Velocity Okay Over Under

Okay 95.3 1.9 2.8

Over 19.2 80.8 0

Under 22.8 0 77.2

FIRST HALF PREDICTED CLASSES

ACTUAL CLASSES Velocity Okay Over Under

Okay 87.6 1.8 10.6

Over 14.4 85.6 0

Under 9.6 0 90.4

SECOND HALF PREDICTED CLASSES

ACTUAL CLASSES Velocity Okay Over Under

Okay 96 1.6 2.4

Over 46.4 53.6 0

Under 44 0 56

AVERAGE OF CROSS VALIDATIONS

PREDICTED CLASSES

ACTUAL CLASSES Velocity Okay Over Under

Okay 91.8 1.7 6.5

Over 30.4 69.6 0

Under 26.8 0 73.2

!W_MLE

displacementcomputed using the entire displacement data set for training is given by

WMLE displacement

= 2.680 0.0002 0.0003 0.024 )0.056

)0.910 0.052 0.012 )0.294 )0.059

)1.757 )0.015 )0.020 0.264 0.130

⎡

⎣

⎢⎢

⎤

⎦

⎥⎥

(3.35)

Table 3.6: Confusion matrix for probabilistic discriminative classification with MLE using displacement data.

ALL DATA PREDICTED CLASSES

ACTUAL CLASSES Displacement Okay Over Under

Okay 94.3 2.8 2.9

Over 30.8 69.2 0

Under 22.4 0 77.6

FIRST HALF PREDICTED CLASSES

ACTUAL CLASSES Displacement Okay Over Under

Okay 91.6 2.6 5.8

Over 24.8 75.2 0

Under 10.4 0.8 88.8

SECOND HALF PREDICTED CLASSES

ACTUAL CLASSES Displacement Okay Over Under

Okay 94.4 2.2 3.4

Over 50.4 49.6 0

Under 49.6 0 50.4

AVERAGE OF CROSS VALIDATIONS

PREDICTED CLASSES

ACTUAL CLASSES Displacement Okay Over Under

Okay 93 2.4 4.6

Over 37.6 62.4 0

Under 30 0.4 69.6

Dalam dokumen Real-Time Bayesian Analysis of Ground Motion Envelopes for Earthquake Early Warning (Halaman 52-59)