• Tidak ada hasil yang ditemukan

2.2 SVM Classifier

2.2.2 Multiclass SVM

X +

-

+ +

+ + +

- -

- -

-

Ø (+)

Ø (+) Ø (+)

Ø (+)

Ø (+) Ø

X1 X1

X2

Ø ( -)

Ø ( -)

Ø ( -)

Ø ( -) Ø ( -)

2

Figure 2.3 Mapping from the input space to the high dimensional space

The above formulation of SVM is based on the binary classification, and it can handle the multiclass case by simply combining the binary case.

To solve a multiclass SVM using binary classifiers, different coupling strategies, such as one versus one (OVO), one versus all (OVA), and direct acyclic graph (DAG), are developed (Hsu and Lin, 2002). Based on these methods, a multiclass SVM can be achieved by first disintegrating a multiclass problem into number of binary problems. Then training of classifiers is performed to solve the problems allocated to each binary problem. At last the binary SVMs are combined to rebuild the solution of the multiclass problem based on outcomes of different classifiers. For example, the OVO construct k k( 1) / 2 binary classifiers for k class problem, i.e. one classifier for each pair of classes and then voting strategy is applied to get global classification results and finally select the class with maximum votes. The OVA constructs k binary SVM classifier, where training of each classifier is performed to separate one class from the rest and then the classifier that shows maximum accuracy is selected. The DAG is the modification of OVO. The training of the DAG is similar to the OVO; however, it employed a rooted binary DAG for testing, which is having k k( 1) / 2 internal nodes and (k1) leaves.

Hsu and Lin (2002) presented the comparison between all-together approach and three binary classifiers based approach, like the OVO, the OVA and the DAGS. They experimentally concluded that the OVO is a competitive approach for the practical use because of its advantages over other methods such as the higher generalization accuracy and the lowest training time. Therefore, the OVO method has been adopted in this study. Figure 2.4 demonstrates the OVO method of the SVM. It shows that for a three class problem, the OVO constructs three binary SVMs according to the relationk k( 1) / 2. Each SVMs are trained for each pair of classes and further solved.

Figure 2.4 shows that Class 2 gets maximum votes among three classes. So this class is finally

selected according to the Max Wins strategy. If two classes have identical votes, then it chooses a class with smaller index.

Maximum voting criterion Maximum voting

criterion Class 1 versus Class 2

Class 1 versus Class 2

Class 1 versus Class 3 Class 1 versus Class 3

Class 2 versus Class 3 Class 2 versus Class 3 Input vector

Input vector

Class 2

Class 3

Class 2

Class 2

Figure 2.4 One-versus-one multiclass method of the SVM

The following optimization problem of the binary classification can be solved for training example from ithand jth classes

, ,

min (1 ) ( ( )

2 )

ij ij ij

T ij T

w b t

t

wij wijC

wij (2.31)

(wij)T

( )xt   bij 1

tij; if yti (2.32)

(wij)T

( )xt    bij 1

tij; if ytj (2.33) and

ij 0,

t  (2.34)

where, similar to the binary SVM problem, (xt) is a mapping function, ( ,xt yt) is the training sample, w and b are the weighting factors,

ij is the slack variable, and C is the penalty parameter. Now, for future testing with unseen examples, a following voting approach is adopted here. The voting strategy is done in every binary classification model for testing on unseen examples, where votes can be casted for each data samples, x. If f( )xsign((wij)TK( )xnbij) decides x to be in ith class, then vote for this class is added by one. Otherwise the

j

th class is increased by one. Then it predicts x to be in the class with the maximum votes. This approach is also known as Max Wins strategy (Kressel, 1999). If two classes have identical votes, then it chooses a class with a smaller index. For more on the SVM, interested researchers can refer Vapnik (1995).

2.3 Cross-Validation Method Along with Grid Search for SVM Parameters Selection The performance of SVM is dependent on the selection of SVM parameters. In this study, the multiclass SVM is used with the RBF kernel. There are two parameters while using the RBF kernel, such as the Lagrange penalty parameter or soft margin constant,C,and the inverse-width parameter of RBF kernel, . SVM parameters define the decision boundary of the classifier.

Therefore, the objective is to choose a pair of (C,) which provides the best prediction performance of the SVM. The standard method of choosing the pair of (C,) is the cross- validation (CV) method along with grid-search as shown in Figure 2.5 (Hsu et al., 2003). Here 5- fold CV method is used. In 5-fold CV, first the available data are divided into 5 subsets of equal size. At a time, the training folds contain four of the groups (i.e., roughly 4/5 of the data) and the test folds contain the other group (i.e., roughly 1/5 of the data). Each subset at a time takes part as

a testing data and other subset as a training data. Therefore, each instance of the whole training data is predicted once so the CV accuracy is the percentage of data which are correctly validated.

Along with the cross-validation, a grid-search is carried out to choose C and  by defining a two dimensional grid. Grid points are generally chosen on a logarithmic scale that covers a large range of these parameters for the practical purpose. In this study, C and  are given in exponentially growing sequence, i.e., 35,3 ,...,34 15 and 315, 34,..., 33, respectively. Now the CV accuracy is estimated for several pairs of (C,) first on the coarse grid. After finding the better region on the grid, a finer grid-search on the same region is performed. Finally, the best combination of (C,), which gives the highest CV accuracy is retained. After identifying the best (C,), the SVM is trained again using the whole training data to generate the final classifier model.

Yes

No Training SVM classifier

using 5-fold cross validation Average CV accuracy

Termination criterion

Optimized (C, γ ) pair

Grid search New (C, γ ) pair Training data set

Initial (C, γ ) pair

Figure 2.5 The cross validation method along with grid search for SVM parameter selection

The cross-validation can avoid the over-fitting problem (Hsu et al., 2003). To explain this issue, a binary classification problem is used here as shown in Figure 2.6. Hollow circles and rectangles are the training data while filled circles and rectangles are the testing data. Figure 2.6 (a) and (b) show the case of over-fitted classifier. In this case the training accuracy is high, but testing accuracy is low. Figure 2.6 (c) and (d) show the classifiers without any overfitting therefore it gives better CV as well as the testing accuracy.

X2

X11

(a) Training data and an overfitting classifier

X2

X1

(b) Applying an overfitting classifier on testing data

X2

X11 (c) Training data and a better classifier

X2

X1 (d) Applying a better classifier on testing data Figure 2.6 An overfitting classifier and a better classifier