• Tidak ada hasil yang ditemukan

Empirical Analysis of GA-based Parameters Optimization

7.5 Experimental Results

7.5.2 Empirical Analysis of GA-based Parameters Optimization

126 7 Evolving Least Squares SVM for Credit Risk Analysis

As can be seen form Fig. 7.3, several finding can be observed. First of all, before 70%~80% partition rate, the classification performance gener- ally improves with the increase of data partition. The main reason is that too less training data (e.g., 20%~30% partition rates) is often insufficient for LSSVM learning. Second, when 90% partition rate is used, the predic- tions show the slightly worse performance relative to 80% partition rate.

The reason is unknown and it is worth exploring further with more ex- periments. Third, in the three testing cases, the performance of the Dataset 1 is slightly better than the other two cases. The possible reason is that Dataset 1 has a relatively small number of features than other two datasets and real reason is worth further exploring in the future. In summary, the evolving LSSVM learning paradigm with GA-based feature selection is rather robust. The hit ratios of almost all data partition are above 77.7%

except that the data partition is set to 20% for Dataset 1 (75.67%). Fur- thermore, the variance of five-fold cross validation experiments is small at 1%~4%. These findings also imply that the feature evolved LSSVM model can effectively improve the credit evaluation performance.

7.5.2 Empirical Analysis of GA-based Parameters Optimization

7.5 Experimental Results 127 Table 7.2. Optimal solutions of different parameters for LSSVM

Kernel mixed coefficients Kernel parameters Upper

bound Fitness value Data

set λ1 λ2 λ3 d σ ρ θ C f(Θ)

0.1 0.5 0.4 1.82 3.45 46.66 87.85 189.93 0.7945 1

0.2 0.5 0.3 1.75 4.13 54.21 77.67 166.34 0.7889 0.4 0.4 0.2 1.95 3.08 39.97 93.17 197.15 0.7907 0.5 0.4 0.1 2.54 5.72 62.11 52.82 108.54 0.7872 0.1 0.5 0.4 2.35 5.64 78.22 100.8 95.350 0.7232 2

0.2 0.5 0.3 2.78 6.43 72.55 99.98 105.34 0.7087 0.4 0.4 0.2 3.05 4.48 79.87 85.33 166.29 0.7236 0.5 0.4 0.1 3.45 5.65 66.67 65.84 184.15 0.7129 0.1 0.5 0.4 1.89 3.34 45.76 91.22 177.76 0.7686 3

0.2 0.5 0.3 2.08 3.89 38.74 78.82 164.35 0.7731 0.4 0.4 0.2 2.61 4.55 36.87 95.34 213.54 0.7689 0.5 0.4 0.1 2.99 6.67 44.67 99.87 239.66 0.7604 As can be seen from Table 7.2, some important and interesting conclu- sions can be found. First of all, observing the kernel mixed coefficients, it is easy to find that for the all testing cases the proportion of the RBF kernel (i.e., the coefficient of λ2) is the highest. This demonstrates that a RBF kernel has good ability to increase the generalization capability. Second, when the coefficient of λ1 arrives at 0.5, the classification performance is the worst of all the three testing cases. This implies that the polynomial kernel cannot dominate the kernel for LSSVM learning when complex learning task is assigned. Third, Similar to RBF kernel, the sigmoid kernel can also increase the generalization ability of LSSVM or decrease the error rate of LSSVM generalization through observing the change of λ3. Fourth, in terms of fitness function value, we can roughly estimate the rational range of kernel parameters. For example, for the parameter d, the range [1.5, 3.5] seems to generate good generalization performance. Fifth, for the upper bound parameter, a value larger than 100 seems to be suitable for practical classification applications. Perhaps such a value may give an ap- propriate emphasis on misclassification rate.

In addition, an important problem in parameter evolution is computa- tional time complexity. It is well known to all that the GA is a class of sto- chastic search algorithm and it is a time-consuming algorithm. Although GA-based input feature evolution has reduce the modeling complexity to some extent, but the parameters search is still a time-consuming process.

For comparison purpose, two commonly used parameter search methods, the grid search (GS) algorithm (Kim, 1997) and direct search (DS) algo- rithm (Hooke and Jeeves, 1961; Mathworks, 2006) are used.

128 7 Evolving Least Squares SVM for Credit Risk Analysis

In grid search (GS) algorithm, each parameter in the grid is first defined by the grid range. Then a unit grid size is evaluated by an objective func- tion f. The point with best f value corresponds to the optimal parameter.

The grid search may offer some protection against local minima but it is not very efficient. The optimal results depend on the initial grid range.

Usually a large grid range can provide more chance to achieve the optimal solution but it takes more computational time. Interested readers can refer to Kim (1997) for more details about grid search.

Direct search (DS) algorithm is a simple and straightforward search method and can be applied to many nonlinear optimization problems. Sup- pose the search space dimension is n, a point p in this space can be denoted by (z1, z2, …, zn), the objective function is f, and pattern v is a collection of vectors that is used to determine which points to search in terms of a cur- rent point, i.e., v = [v1, v2, …, v2n], v1 = [1, 0, …, 0], v2 = [0, 1, …, 0], …, vn

= [0, 0, …, 1], vn+1 = [-1, 0, …, 0], vn+2 = [0, -1, …, 0], …, v2n = [0, 0, …, - 1], viRn, i =1, 2, …, 2n. The points set M = {m1, m2, …, m2n} around current point p to be searched are defined by the mesh which multiple the pattern vector v by a scalar r, called the mesh size. If there is at least one point in the mesh whose objective function value is better than that of the current point p, we replace this old point with the new point until the best point is found. For more details, please refer to Hooke and Jeeves (1961) and Mathworks (2006).

In this chapter, the computational time of GA is compared with grid search algorithm and direct search algorithm. In our experiments, the ini- tial parameter range is determined by Table 7.2. In the grid search method, the unit grid size is 0.5. In the direction search algorithm, the maximum it- eration is 100. The parameter setting of GA is similar to the above experi- ment, as shown before Table 7.2. The program is run on an IBM T60 Notebook with Pentium IV CPU running at 1.8GHz with 2048 MB RAM.

All the programs are implemented with Matlab language. The experiments of three different parameter search methods used the identical training and testing sets with five-fold cross validation. The average classification accu- racy of the three methods and computational time are shown in Table 7.3.

From Table 7.3, we can find that for different credit datasets, the pa- rameter search efficiency and performance is similar. Furthermore, there is no significant difference among the three parameter search methods for the average prediction performance. However, the computational time of each parameter search algorithm is distinctly different. In the three methods, the CPU time of the grid search algorithm is the longest for the three testing credit datasets, followed by the genetic algorithm. The shortest CPU time

7.5 Experimental Results 129 is direct search methods. Fortunately, the classification performance of the GA is slightly better than that of direct search except the case of Dataset 2.

Table 7.3. Computational performance comparisons using different parameter search methods for three credit datasets

Dataset Parameter search method Prediction performance (%) CPU time (s)

1 Grid Search 79.07 10221.33

Direct Search 79.13 1092.65

Genetic Algorithm 79.48 2212.56

2 Grid Search 71.97 11310.42

Direct Search 72.37 1200.51

Genetic Algorithm 72.25 2423.55

3 Grid Search 77.31 11789.45

Direct Search 77.02 1298.26

Genetic Algorithm 77.18 2589.98

In addition, to fully measure the prediction and exploration power of the proposed evolving LSSVM learning paradigm, it is required to further compare with other classification models, which is performed in the fol- lowing subsection.