96 6 Evaluating Credit Risk with a Bilateral-Weighted Fuzzy SVM Model the population. Because we don’t know the percent of good in the popula- tion, we substitute it with the percent of good in the training dataset, which means the cutoff will be adjusted such that the top
(
N+ N)
×100% percentof instances are accepted and others are refused. Assume the test dataset have T instances. We sort the decision values of all instances in ascending order and label the highestRound
(
T×N+ N)
instances as good customers and others as bad customers, where Round(·) is the function that rounds the element to the nearest integer.The classification performance is measured by its Type1 accuracy, Type2 accuracy and overall accuracy, which are the percent of correctly classified of good customers, percent of correctly classified of bad custom- ers and the percent of correctly classified in total respectively.
To decrease the bias due to the choice of split between training sample and test sample, we randomly divide each dataset into a test and training set, train the machine or estimate the parameters and obtain the 24 models’
test results based on this split. The above steps are done k times independ- ently. The final performance evaluation is based on the average results of the ten tests.
Here we use one-tailed paired t-test to compare the average total accu- racy. Note t-test is just a heuristic method for comparison in here due to the fact that the twenty rates (ten pairs) are not independent.
6.3.1 Dataset 1: UK Case
The first dataset is about UK corporations from the Financial Analysis Made Easy (FAME) CD-ROM database which can be found in the Appen- dix of Beynon and Peel (2002). It contains 30 failed and 30 non-failed firms. 12 variables are used as the firms’ characteristics, which are de- scribed in Section 4.4.1 of Chapter 4.
In each individual test 40 firms are randomly drawn as the training sam- ple. Due to the scarcity of inputs, we make the number of good firms equal to the number of bad firms in both the training and testing samples, so as to avoid the embarrassing situations that just two or three good (or bad, equally likely) inputs in the test sample. For example the Type1 accuracy can only be 0, 50% or 100% if the test dataset include only 2 good in- stances. Thus the test dataset has 10 data of each class. In ANN a two- layer feed-forward network with 3 TANSIG neurons in the first layer and one PURELIN neuron is used. The network training function is the TRAINLM. In all SVMs the regularization parameter C=10, d=4 for poly- nomial kernel and σ2 =1 for the RBF kernel. The above parameters are
6.3 Empirical Analysis 97 obtained by trail and error. Such test is repeated ten times and the final Type1 and Type2 accuracy is the average of the results of the ten individ- ual tests. The computational results are shown in the Table 6.1.
Table 6.1. Empirical results on the dataset 1
Model Kernel Membership Type1 Type2 Overall Std Rank
LinR 69.00% 69.00% 69.00% 6.88%
LogR 71.00% 71.00% 71.00% 6.58%
ANN 73.00% 73.00% 73.00% 7.97%
SVM Lin 73.00% 73.00% 73.00% 4.87%
Poly 71.00% 71.00% 71.00% 4.83%
RBF 77.00% 77.00% 77.00% 6.78% 2
U-FSVM Lin Linear regression 69.00% 69.00% 69.00% 4.51%
Lin Logit regression 70.00% 70.00% 70.00% 5.87%
Lin Neural network 71.00% 71.00% 71.00% 6.34%
Poly Linear regression 71.00% 71.00% 71.00% 6.61%
Poly Logit regression 71.00% 71.00% 71.00% 5.79%
Poly Neural network 71.00% 71.00% 71.00% 6.00%
RBF Linear regression 75.00% 75.00% 75.00% 3.53%
RBF Logit regression 76.00% 76.00% 76.00% 5.89%
RBF Neural network 75.00% 75.00% 75.00% 3.03%
B-FSVM Lin Linear regression 69.00% 69.00% 69.00% 5.81%
Lin Logit regression 73.00% 73.00% 73.00% 4.59%
Lin Neural network 73.00% 73.00% 73.00% 4.97%
Poly Linear regression 74.00% 74.00% 74.00% 5.07%
Poly Logit regression 71.00% 71.00% 71.00% 6.03%
Poly Neural network 72.00% 72.00% 72.00% 5.85%
RBF Linear regression 75.00% 75.00% 75.00% 5.91%
RBF Logit regression 79.00% 79.00% 79.00% 5.65% 1 RBF Neural network 77.00% 77.00% 77.00% 6.21% 2
Max 79.00%
Min 69.00%
Average 72.75%
98 6 Evaluating Credit Risk with a Bilateral-Weighted Fuzzy SVM Model In Table 6.1, note that the column titled Std is the sample standard de- viations of 10 overall rates from the 10 independent tests by the methods specified by the first three columns. The best average test performance is underlined and denoted in bold face. Bold face script is used to tabulate performances that are not significantly lower at the 10% level than the top performance using a two-tailed paired t-test. Other performances are tabu- lated using normal script. The top 3 performances are ranked in the last column. The following two empirical tests are also present in the same format.
Note Type1 rate, Type2 rate and overall rate are identical in each pair in Table 6.1. The outcome is due to the fact that the test dataset has 10 posi- tive instances and 10 negative instances. The cutoff is selected such that the top 50% instances are labeled as positive and others negative (the 50%
is determined by the population of training dataset.). If the top 10 instances labeled as positive include j positive instances, the other 10 instances la- beled as negative include 10-j positive instances and 10-(10-j) negative in- stances, so both Type1 rate and Type2 rate are j/10.
As shown in the Table 6.1, the new bilateral-weighted FSVM with RBF kernel and logit regression membership generation achieves the best per- formance. It correctly classifies 79% of goods instances and 79% of bad firms. The standard SVM with RBF kernel and B-FSVM with RBF kernel and neural network membership achieve the second best performance.
Among the 24 methods, 19 methods’ average total classification rates are found lower than the best performance at 10% level by one-tail paired t- test. It is obvious that the B-FSVM is better than U-FSVM in most of the combinations of kernels and membership generating methods.
6.3.2 Dataset 2: Japanese Case
The second dataset is about Japanese credit card application approval ob- tained UCI Machine Learning Repository. For confidentiality all attribute names and values have been changed to meaningless symbols. After delet- ing the data with missing attribute values, we obtain 653 data, with 357 cases granted and 296 cases refused.
In the 15 attributes A4-A7 and A13 are multi-value attributes. If Ai is a k-value attribute, we encode it with k-1 0-1 dummy variables in linear re- gression and logit regression and k-value (from 1 to k) variables in ANN and SVMs.
In this empirical test we randomly draw 400 data from the 653 data as the training sample and the else as the test sample. In ANN a two-layer feed-forward network with 4 TANSIG neurons in the first layer and one
6.3 Empirical Analysis 99 PURELIN neuron is used. The network training function is the TRAINLM. In all SVMs, the kernel parameters C=50, d = 4 for polyno- mial kernel and σ2 =5 for the RBF kernel. The above parameters are ob- tained by trial and error. Table 6.2 summaries the results of comparisons.
Table 6.2. Empirical results on the dataset 2
Model Kernel Membership Type1 Type2 Overall Std Rank
LinR 81.21%
LogR 82.27% 82.85% 82.53% 3.38% 3
ANN 80.32% 82.82% 81.45% 6.31%
SVM Lin 78.82% 82.91% 80.68% 3.41 %
Poly 79.05% 76.50% 4.37 %
RBF 76.85% 81.15% 78.80% 5.11 %
U-FSVM Lin Linear regression 78.90% 82.90% 80.71% 5. 33%
Lin Logit regression 78.30% 82.25% 80.09% 6. 69%
Lin Neural network 67.48% 73.63% 70.27% 4.28 % Poly Linear regression 74.58% 78.99% 76.58% 4.94 % Poly Logit regression 73.37% 78.28% 75.60% 5.36 % Poly Neural network 74.28% 79.03% 76.43% 3.58 % RBF Linear regression 77.04% 81.18% 78.91% 3.63 % RBF Logit regression 79.88% 82.71% 81.16% 3.99%
RBF Neural network 75.58% 80.06% 77.61% 3.63 % B-FSVM Lin Linear regression 76.18% 79.39% 77.63% 5.85 % Lin Logit regression 73.08% 78.21% 75.41% 4.42 % Lin Neural network 78.43% 82.48% 80.27% 6.11%
Poly Linear regression 80.16% 83.88% 81.84% 6.56%
Poly Logit regression 82.70% 85.43% 83.94% 4.75% 1 Poly Neural network 77.85% 82.82% 80.10% 3.97 % RBF Linear regression 81.06% 83.04% 81.96% 5.85%
RBF Logit regression 82.37% 85.05% 83.58% 4.82% 2 RBF Neural network 80.86% 82.21% 81.47% 5.34%
Max 83.94%
Min 70.27%
79.39%
74.38%
4%
82.3 81.72% 6.58%
Average
100 6 Evaluating Credit Risk with a Bilateral-Weighted Fuzzy SVM Model Similarly, from Table 6.2, it is easy to find that the B-FSVM with poly- nomial kernel and logistic regression membership generation achieves the best performance. B-FSVM with RBF kernel and logistic regression mem- bership generation and logit regression and achieves the second and third best performance respectively. The average performances of half of the 24 models are found below the best performance at 10% level by one-tailed paired t-test. The results also show that the new fuzzy SVM is better than standard SVM and U-FSVM when they use the same kernel in most case.
6.3.3 Dataset 3: England Case
The third dataset is from the CD ROM accessory of Thomas et al. (2002).
The data set contains 1225 applicants, of which 323 are observed bad creditors. In our experiments, only 12 variables are chosen, which is dif- ferent from Section 5.3 in Chapter 5.
(01). Year of birth (02). Number of children
(03). Number of other dependents (04). Is there a home phone (binary) (05). Spouse's income
(06). Applicant's income (07). Value of home
(08). Mortgage balance outstanding (09). Outgoings on mortgage or rent (10). Outgoings on loans
(11). Outgoings on hire purchase (12). Outgoings on credit cards
In the dataset the numbers of healthy cases (902) are nearly three times of delinquent cases (323). To make the numbers of the two classes nearly equal, we triple delinquent data, i.e., means we add two copy of each de- linquent case. Thus the total dataset includes 1871 data. Then we randomly draw 800 data from the 1871 data as the training sample and else as the test sample. In ANN a two-layer feed-forward network with 4 TANSIG neurons in the first layer and one PURELIN neuron is used. The network training function is the TRAINLM. In all support vector machines the regularization parameter is 50. In all SVM, the kernel parameters are same, d=5 for polynomial kernel and σ2 =3 for the RBF kernel. Such trial is re- peated ten times. The experimental results are presented in Table 6.3.
6.3 Empirical Analysis 101 Table 6.3. Empirical results on the dataset 3
Model Kernel Membership Type1 Type2 Overall Std Rank LinR
LogR 65.79% 62.68% 64.18% 3.08%
ANN 63.82% 60.56% 62.13% 3.18 %
SVM Lin 67.06% 63.79% 65.37% 2.13%
Poly 67.01% 63.75% 65.32% 3.83%
RBF 66.26% 62.94% 64.54% 2.11%
U-FSVM Lin Linear regression 63.80% 60.29% 61.98% 2.07 % Lin Logit regression 64.54% 61.11% 62.76% 2.63 % Lin Neural network 63.32% 59.77% 61.48% 4.39 % Poly Linear regression 66.88% 63.61% 65.19% 3.32%
Poly Logit regression 67.06% 63.81% 65.38% 4.09%
Poly Neural network 66.92% 63.66% 65.23% 3.96%
RBF Linear regression 66.81% 63.53% 65.11% 4.26%
RBF Logit regression 67.10% 63.84% 65.41% 4.74% 3 RBF Neural network 66.43% 63.11% 64.71% 3. 60%
B-FSVM Lin Linear regression 65.80% 62.45% 64.06% 3. 37%
Lin Logit regression 66.03% 62.70% 64.30% 2. 34%
Lin Neural network 64.33% 60.87% 62.54% 3.04 % Poly Linear regression 63.00% 59.35% 61.11% 2.84 % Poly Logit regression 63.52% 60.13% 61.76% 3.91 % Poly Neural network 63.41% 59.89% 61.59% 2.19 % RBF Linear regression 67.41% 64.19% 65.74% 2.25% 2 RBF Logit regression 67.83% 64.63% 66.17% 3.13% 1 RBF Neural network 66.76% 63.48% 65.06% 3.43%
Max 66.17%
Min 61.11%
Average 63.97%
As can be seen from Table 6.3, it is obvious that the separability of the dataset is clearly lower than the above two datasets. All models’ average overall classification accuracy is less than 70%. The best performance is achieved by the B-FSVM with RBF kernel and logistic regression mem-
65.74% 62.62% 64.12% 2.94%
102 6 Evaluating Credit Risk with a Bilateral-Weighted Fuzzy SVM Model bership generation. B-FSVM with RBF kernel and linear regression mem- bership generation and U-FSVM with RBF and logit regression member- ship generation achieves the second and third best performances respec- tively. The lower separability also be can be seem that only 8 models in 24 models are found lower average performance than the best performance at 10% level by one-tail paired t-test.
It seems that the average performances of standard SVM, U-FSVM and B-FSVM are quite close in this test. It may be due to the undesirable membership generation. The linear regression, logit regression and ANN themselves only achieves the average overall performances 64.12%, 64.18% and 62.13%. We believe that bad memberships may provide wrong signals and disturb the training of the SVM, instead of deceasing its vulnerability to outliers in most the combinations of kernels and member- ship generation methods.