• Tidak ada hasil yang ditemukan

Methodology Description

value at the current point. It expands and shrinks search space automati- cally and does not consume too much time as GS or GA. As far as we know, this may be the first attempt to introduce direct search to optimize the parameters of support vector machines.

The main purpose of this chapter is to propose the LSSVM-based credit scoring models with direct search method for parameters selection. The rest of this chapter is organized as follows. In Section 3.2, the LSSVM and DS methodology are described briefly. Section 3.3 presents a computa- tional experiment to demonstrate the effectiveness and efficiency of the model and simultaneously we compared the performance between the DS and DOE, GA, and GS methods. Section 3.4 gives concluding remarks.

3.2 Methodology Description

In this section, a brief introduction of least squares SVM (LSSVM) is first provided and then the direct search method for parameter selection of LSSVM is proposed.

3.2.1 Brief Review of LSSVM

Given a training dataset

{

xk,yk

}

Nk=1 where input dataxkRn and its corre- sponding output ykR and yk∈{1,−1}, If the set is linearly separable, the classifier should be constructed as follows.

⎪⎩

⎪⎨

=

≤ +

=

≥ +

1

if 1

1 if 1

k k

T

k k

T

y b

x w

y b

x

w (3.1)

The separating hyperplane is

0 )

(x =w x+b=

H T (3.2)

And the linear classier is:

) (

)

(x sign w x b

y = T + (3.3)

SVM finds an optimal separating hyperplane that can separate all the training data points without errors and the distance between the closest points to the hyperplane by solving the following optimization problem:

44 3 Credit Risk Evaluation Using SVM with Direct Search

⎪⎩

⎪⎨

=

+ ) 1, 1, , . (

s.t.

2 min1

,

N k

b x w y

w w

k T k

T b w

L

(3.4) For most of the real-life problems, they may be non-separable, which means that we can not find a perfect separating hyperplane. For this case, in LSSVM, an error variable xk for each sample is introduced such that misclassification can be tolerated. Consequently, the training objective is not only to maximize the classification margin but also to minimize the sum of squared error of each sample simultaneously. Since it is almost im- possible to get the optimal solution for each objective at the same time, there should be a trade-off between these two objectives, as shown below (Suykens, 1999; Suykens et al., 2002). Note that the objective function of LSSVM is different from the standard SVM proposed by Vapnik (1995).

⎪⎩

⎪⎨

=

≥ +

+

=

=

. , , 1 , 1 ) (

s.t.

2 ) 1 , , (

min 1 2

, ,

N k

b x w y

C w w b

w J

k T k

k

N

k k

T b

w

ξ L ξ

ξ ξ (3.5)

where C is the upper bound parameters on the training error. The Lagran- gian function for this problem can be represented by

( )

[ ]

=

=

+

− +

− +

=

N

k T k k

k k

N

k k

T

b x w y C w w b

w J

1

1 2

1

2 ) 1 , , , (

ξ α

ξ α

ξ (3.6)

where αk are the Lagrange multipliers. We can get condition for optimal- ity by differentiating (3.6) with w,bkk for k=1,L,N as follows:

( )

⎪⎪

⎪⎪

⎪⎪

⎪⎪

= +

− +

∂ =

=

∂ =

=

∂ =

=

∂ =

=

=

0 1

0

2 0

0 0

0

1 1

k k

T k k

k k

k

N

k k k

N

k k k k

b x w J y

J C b y J

x y w w

J

α ξ

ξ ξ α

α α

(3.7)

3.2 Methodology Description 45 From the solutions of equations (3.7), we can get the following classi- fier:

( )

[ ∑

= ⋅ +

]

= +

=sign wTx b sign Nk kyk xkT x b x

y( ) ( ) 1α (3.8)

The linear SVM has been extended to a nonlinear SVM. Its main idea is to map the input data into a high dimensional feature space which can be infinite dimensional and then to construct the linear separating hyperplane in this high dimensional feature space. Let the mapping function denoted by ϕ(x), we can formally replace x by ϕ(x)in (3.6), and let K(xk,x) be the inner product kernel performing the nonlinear mapping into higher di- mensional feature space, so

) ( ) ( ) ,

(x x x x

K kT k ⋅ϕ (3.9)

Finally, no explicit construction of the mapping function ϕ(x) is re- quired, only a kernel function is needed instead, and then the LSSVM clas- sifier takes the following form

( )

[ ∑

= ⋅ +

]

=sign Nk kykK xk x b x

y( ) 1α (3.10)

Some typical kernel functions include linear function, polynomial func- tion and Gaussian function etc., which are listed in Chapter 2.

In this chapter, we use the LSSVM with Gaussian function to implement the credit risk evaluation tasks. In the Gaussian-kernel-based LSSVM, two main parameters, upper bound parameter C and kernel parameter σ, are not optimal in many practical applications. To obtain better performance, they should be optimized. In the next subsection, we adopt the direct search method to optimize the parameters of LSSVM.

3.2.2 Direct Search for Parameter Selection

Direct search (DS) methods are a class of simple and straightforward pa- rameter optimal search method and it can almost immediately be applied to many nonlinear optimization problems, especially for the problems with lower dimension searching space (Hooke and Jeeves, 1961; Mathworks, 2006). Since the dimension of parameters space for LSSVM is 2, direct search is a good choice for the parameters selection.

Let the search space dimension is n, a point p in this space can be de- noted by (z1, z2,…, zn), the objective function f, pattern v which is a collec- tion of vectors that is used to determine which points to search in terms of a current point, v = [v1, v2, …, v2n], v1 = [1, 0, …, 0], v2 = [0, 1, 0, …,

46 3 Credit Risk Evaluation Using SVM with Direct Search

0], …, vn = [0, 0, …, 1], vn+1 = [-1, 0, …, 0], vn+2 = [0, -1, …, 0], v2n = [0, 0, …, -1], viRn, i =1, 2, …, 2n. The point set around current point p to be searched is defined by the mesh which multiple the patter vectors v by a scalar r, called the mesh size and add the resulting vectors to the current point. The mesh can be denoted by the points set M = {m1, m2, …, m2n}.

For example, if there are two independent parameters in the optimization problem, the patter v is defined as following:

v = [v1, v2, v3, v4], v1 = [1, 0], v2 = [0, 1], v3 = [-1, 0], v4 = [0, -1]

Let current point p be [1.2, 3.4], and mesh size r = 2, then the mesh con- tain the following points:

[1.2, 3.4] + 2 × [1, 0] = [3.2, 3.4]

[1.2, 3.4] + 2 × [0, 1] = [1.2, 5.4]

[1.2, 3.4] + 2 × [-1, 0] = [-0.8, 3.4]

[1.2, 3.4] + 2 × [0, -1] = [1.2, 1.4]

If there is at least one point in the mesh whose objective function value is less than that of the current point, we call the poll is successful. The status of the poll is denoted by Flag.

The main steps of direct search algorithm are presented as follows:

(1) Set the initial current point p, pbest= p, ybest = f(p), r =1;

(2) Form the mesh M = {m1, m2, …, m2n} of current point p, Flag = FALSE, j=1,

While (j<=2n) do y =f(mj);

if (y < ybest) then

ybest = y; pbest = mj; p=mj; Flag = TRUE; break;

end j=j+1 End while

(3) If Flag = TRUE, r = 2×r; else r = 1/2×r; if stop criteria is met, stop;

else go to (2).

The algorithm stops when any of the following conditions occurs : (1) The mesh size is less than Mesh tolerance Mesh_tol; (2) The number of it- erations performed by the algorithm reaches the value of Max iteration Max_iter; (3) The total number of objective function evaluations per- formed by the algorithm reaches the value of Max function evaluations Max_fun; (4) The distance between the point found at one successful poll and the point found at the next successful poll is less than X tolerance X_tol; (5) The change in the objective function from one successful poll to the next successful poll is less than Function tolerance Fun_tol. Using the above direct research algorithm, we can obtain optimal LSSVM model pa-