Comparisons of Models - Literature Investigation and Analysis

1.3 Literature Investigation and Analysis

1.3.3 Comparisons of Models

In general, each method has its advantages and disadvantages to build credit risk evaluation model. But numerous experiments found that artifi- cial intelligent (AI) technologies can automatically extract knowledge from the dataset and build the model without the users’ deep grasp of the

18 1 Credit Risk Analysis with Computational Intelligence: A Review

problem, while the statistics and mathematical programming methods need to construct the model on the basis of some assumptions and estimate the parameters to fit the data. Relative to AI techniques, the statistical models and mathematical programming models are relatively simple and easy to help the decision maker give an explanation to the rejected applicant. In addition, mathematical programming can easily implement different objec- tives in dealing with practical business problems, so it has good resilience in model construction. In many circumstances, AI technologies can per- form better than traditional methods from the viewpoint of classification accuracy, but the complex structure or configuration of the model hold- backs to achieve optimal results. So it is difficult for someone to determine an overall ‘best’ model for all situations. For the convenience of readers’

understanding, we roughly compare these methods in terms of four different criteria: accuracy, interpretability, simplicity, and flexibility, as shown in Table 1.2. Note that neural network, support vector machine and hybrid/ensemble models usually obtained the best performance.

Table 1.2. Comparisons of different credit risk models based on different criteria Methods Accuracy Interpretability Simplicity Flexibility LDA,LOG, PR

DT KNN LP NN EA RS SVM

Hybrid/ensemble

As can be seen from Table 1.2, it is not difficult to find that each model has its own superiority in different aspects on the above four criteria. Al- though classification accuracy is not the unique criterion for measuring performance of model, the growth of the number of customers and the large loan portfolios under management is increasing steadily in credit in- dustries today, so even a fraction of a percent increase in credit evaluation accuracy is still significant. In a sense, to develop approaches with higher accuracy is the most important for credit risk evaluation models.

For further explanation, we summarized almost all 32 papers which cover almost all above mentioned methods and list their performance in Table 1.3. If one model is tested on several different datasets, we will list

★★ ★★★ ★★★ ★

★★ ★★★ ★★ ★

★★ ★ ★ ★

★★ ★ ★★ ★

★★★ ★★ ★ ★★★

★ ★★★ ★ ★★★

★★★ ★ ★ ★

★★★ ★ ★ ★★

★ ★★★ ★★★ ★

★

1.3 Literature Investigation and Analysis 19 its minimum accuracy and maximum accuracy. If the same type of model has different parameters setting, the one with best performance will be chosen to be listed.

Since all the quantitative models mentioned above are data-driven, the performance of the models is very sensitive to the data sets. From all the papers that have been investigated in Table 1.3, it can be observed that the degree of difficulty of mining the data set is almost similar to all models, i.e. a data set is difficult for one model, so is it for other models, at the same time, a data set is easy for one model, so is it for other models. For example, we choose eight models that have been tested on eight datasets from Baesens’ paper (Baesens et al., 2003) to demonstrate the observation, as shown in Fig 1.4.

Fig. 1.4. Performance comparison of different models on different credit datasets Although the performance of all the quantitative models is dependent on the quality of data set, their capabilities to mine the inherent relationship in data set are different. From the investigation on 32 papers in Table 1.3, support vector machines always have the best accuracy among the top methods. However, except the quality of data set, there are some other factors that affect the performance of support vector machines, such as model type of support vector machines, kernel function, parameters selection, classifier function or criteria, etc. For this purpose, we investigated 12 journal articles that have discussed SVM for building credit scoring models. The results are shown in Table 1.4.

20 1 Credit Risk Analysis with Computational Intelligence: A Review

HY 67.24- 77.25 85.3 89.34 97.99- 98.46- 77.92- 86.90 79.49- 89.17

SVM 71.2- 89.5 89.34 75.0

RS 76.78

ET 70.5 65.70 81.6 78.10- 87.00 77.34- 88.27

NN 66.93- 78.58 72.4- 89.4 71.85- 77.84 87.92 66.38 83.19 89.0 84.7 96.16 77.83- 86.83 75.51- 87.93

LP 71.2- 89.5

K-NN 66.7- 89.5 85.05 68.43- 68.42

DT 67.1- 90.4 70.03- 74.25 87.78 87.50 91.7 73.60- 85.90 70.59- 87.06

PR 84.87

LOG 68.60- 78.24 72.0- 89.2 67.30 82.67 86.49 78.4 76.42- 86.19

LDA 72.2- 88.6 86.09 66.53 82.35 86.49

Author & Year Baesens et al. 2005a Baesens et al. 2003a Baesens et al. 2003b Bedingfield and Smith 2003 Chen and Huang 2003 Daubie et al. 2002 Desai et al. 1997 Desai et al. 1996 Galindo and Tamayo 2000 Gao et al. 2006 Van Gestel et al. 2006 Harmen and Leon 2005 Hsieh 2004 Hsieh 2005 Huang et al. 2007 Huang et al. 2006

Table 1.3. Accuracy comparison of different quantitative models No 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

1.3 Literature Investigation and Analysis 21

HY 71.3- 71.9 94.14 93.27 87.36 82.00- 96.20 66.17- 83.94

SVM 83.49 84.34 85.70- 97.00 75.08 72.64 95.00 64.54- 78.8

RS 74.57- 83.72

ET 77.34- 88.27

NN 82.88 82.0 92.63 78.36 87.58 73.85 73.17 84.21 75.51- 87.93 82.50 62.13- 81.45 74.60- 87.14

LP K-NN 67.60- 85.80

DT 82.80 68.9- 69.8 77.95 80.20- 94.6 65.79 74.57- 87.06 69.56- 84.38

PR LOG 60.66 76.08 70.90 85.70- 96.40 76.32 75.40- 86.19 64.18- 82.53 76.30- 87.25

LDA 75.49 69.00

Author & Year Huang et al. 2006 Huysmans et al. 2005 Jagielska and Jaworski 1996 Jiang and Yuan 2007 Lai et al. 2006c Lee and Chen 2005 Lee et al. 2006 Li et al. 2006 Martens et al. 2007 Mirta et al. 2005 Ong et al. 2005 Schebesch and Stecking 2007 Schebesch and

Stecking 2005a Sun and Yang 2006 Wang et al. 2005 West 2000

Table 1.3. Accuracy comparison of different quantitative models -Continued No 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

22 1 Credit Risk Analysis with Computational Intelligence: A Review

Cut-off 0 Bayesian rule Area dependent 0 0 0 0 0 0 0 0 0

Parameters Search Grid search Grid-search Grid-search Trial and error Unknown Unknown Particle Swarm Opti- mization Unknown Unknown Grid-search Grid-search Grid-search

Toolkit Cawley, LS-SVM SVM-light Based Unknown Unknown Unknown Lib-SVM Unknown Unknown Unknown Ls-SVM based Unknown Unknown

Kernel Func- tion RBF, Linear RBF RBF Linear, Poly, RBF

Lin, Poly, RBF, Sig, Coulomb

Lin, Poly, RBF, Sig RBF Lin, Poly, RBF,Sig, Coulomb RBF RBF RBF RBF

Model 1-norm, LS-SVM 1-norm 1-norm 1-norm, LS-SVM, Fuzzy SVM 1-norm 1-norm 1-norm Combining SVMs LS-FSSVM, FSSVM, LS- SVM, 1-norm LS-SVM, LS-SVMbay LS-SVM 1-norm

Author & Year Baesens et al. 2003a Schebesch and Stecking 2005a Schebesch and Stecking 2005b Wang et al. 2005 Stecking and Schebesch 2006 Xiao et al. 2006 Jiang and Yuan 2007 Stecking and Schebesch 2007 Yu et al. 2007a Van Gestel et al. 2006 Harmen and Leon 2005 Martens et al. 2007

Table 1.4. SVM models and their factors No 1 2 3 4 5 6 7 8 9 10 11 12

Dalam dokumen Bio-Inspired Credit Risk Analysis - untag-smd.ac.id (Halaman 31-37)