Through the above evolutionary procedures, the evolving LSSVM learning paradigm with a mixed kernel, best input features and optimal pa- rameters are produced. For illustration and evaluation purpose, the pro- duced evolving LSSVM learning paradigm can be further applied to credit risk modeling and analysis problems in the following section.
7.4 Research Data and Comparable Models
In this section, the research data and their input features are first described.
Then the comparable classification models are brief reviewed.
7.4.1 Research Data
In this chapter, three typical credit datasets, England corporation credit dataset (Dataset 1), England consumer credit dataset (Dataset 2), German consumer credit dataset (Dataset 3), are used to test the effectiveness of the proposed evolving LSSVM learning paradigm.
In Dataset 1, 30 failed and 30 non-failed firms are contained. 12 vari- ables presented in Section 4.4.1 of Chapter 6 are used as the firms’ feature description. For illustrative convenience in this chapter, 12 features are still presented below:
(01) Sales;
(02) ROCE: profit before tax/capital employed (%);
(03) FFTL: funds flow (earnings before tax & depreciation)/total liabili- ties;
(04) GEAR: (current liabilities + long-term debt)/ total assets;
(05) CLTA: current liabilities/total assets;
(06) CACL: current assets/current liabilities;
(07) QACL: (current assets–stock)/current liabilities;
(08) WCTA: (current assets – current liabilities)/ total assets;
(09) LAG: number of days between account year end and the date the annual report and accounts were failed at company registry;
(10) AGE: number of years the company has been operating since in- corporation date;
(11) CHAUD: coded 1 if changed auditor in previous three years, 0 oth- erwise;
(12) BIG6: coded 1 if company auditor is a Big6 auditor, 0 otherwise.
120 7 Evolving Least Squares SVM for Credit Risk Analysis
Similar to Chapter 6, we randomly select 40 firms as the training sample with 20 failed and 20 non-failed firms. Accordingly, the remaining 20 firms with 10 failed and 10 non-failed firms are used as the testing sample.
In Dataset 2, every credit applicant includes 14 variables, described in Table 5.1 of Chapter 5. The dataset includes detailed information of 1225 applicants, in which including 323 observed bad applicants. For illustrative convenience in this chapter, 14 features are still described below, similar to Dataset 1.
(01) Year of birth (02) Number of children
(03) Number of other dependents (04) Is there a home phone (05) Applicant’s income
(06) Applicant’s employment status (07) Spouse’s income
(08) Residential status (09) Value of home
(10) Mortgage balance outstanding (11) Outgoings on mortgage or rent (12) Outgoings on loans
(13) Outgoings on hire purchase (14) Outgoings on credit cards
In the dataset the numbers of healthy cases (902) are nearly three times of delinquent cases (323). To make the numbers of the two classes nearly equal, we triple delinquent data, i.e., means we add two copy of each de- linquent case. Thus the total dataset includes 1871 data. Then we randomly draw 800 data from the 1871 data as the training sample and else as the test sample.
The third dataset is German credit card dataset, which is provided by Professor Dr. Hans Hofmann of the University of Hamburg and is avail- able at UCI Machine Learning Repository (http://www.ics.uci.edu/
~mlearn/databases/statlog/german/). It contains 1000 data, with 700 cases granted credit card and 300 cases refused. In these instances, each case is characterized by 20 feature variables, 7 numerical and 13 categorical, which are described as follows.
(01) Status of existing checking account (categorical) (02) Duration in months (numerical)
(03) Credit history (categorical) (04) Purpose (categorical) (05) Credit account (numerical)
7.4 Research Data and Comparable Models 121 (06) Savings account/bonds (categorical)
(07) Present employment since (categorical)
(08) Installment rate in percentage of disposable income (numerical) (09) Personal status and sex (categorical)
(10) Other debtors/guarantors (categorical) (11) Present residence since (numerical) (12) Property (categorical)
(13) Age in years (numerical)
(14) Other installment plans (categorical) (15) Housing (categorical)
(16) Number of existing credits at this bank (numerical) (17) Job (categorical)
(18) Number of people being liable to provide maintenance (numerical) (19) Have telephone or not (categorical)
(20) Foreign worker (categorical)
To make the numbers of the two classes near equal, we double bad cases, i.e. we add one copy of each bad case. Thus the total dataset now has 1300 cases. This processing is similar to the first dataset and the main reason of such a pre-processing step is to avoid drawing too many good cases or too few bad cases in the training sample. Then we randomly draw 800 data with 400 good cases and 400 bad cases from the 1300 data as the training sample and the remaining 500 cases are used as the testing sample (i.e. 300 good applicants and 200 bad applicants).
It is worth noting that this data split is only for final validation and test- ing purpose, which is determined by the later experiments. For feature se- lection and parameter optimization procedure, k-fold cross validation split method is used, which will be described later.
7.4.2 Overview of Other Comparable Classification Models In order to evaluate the classification ability of the proposed evolving LSSVM learning paradigm, we compare its performance with those of conventional linear statistical methods, such as linear discriminant analysis (LDA) models, and nonlinear intelligent models, such as back-propagation neural network (BPNN) model, standard SVM model, individual LSSVM model without GA-based input feature selection, as well as individual LSSVM model without GA-based parameter optimization. Typically, we select the LDA model, individual BPNN model and standard SVM model with full feature variables as the benchmarks. For further comparison, in- dividual LSSVM model with polynomial kernel (LSSVMpoly), individual LSSVM model with RBF kernel (LSSVMrbf), individual LSSVM model
122 7 Evolving Least Squares SVM for Credit Risk Analysis
with sigmoid kernel (LSSVMsig), individual LSSVM model with a mixed kernel (LSSVMmix), individual LSSVM model with GA-based input fea- ture selection only (LSSVMgafs) and individual LSSVM model with GA- based parameter optimization (LSSVMgapo) are also conducted. We do not compare our proposed learning paradigm to a standard logit regression model because a logit regression model is a special case of single BPNN model with one hidden node.
In LDA [2], it can handle the case in which the within-class frequencies are unequal and its performance has been examined on randomly gener- ated test data. This method maximizes the ratio of between-class variance to the within-class variance in any particular data set, thereby guaranteeing maximal separability. Usually, a LDA model with d-dimension input fea- tures takes the following form:
) sgn(
)
(x = a0+
∑
di=1aixiz (7.13)
where a0 is the intercept, xi is the ith input feature variable, and ai are the coefficients of related variables. Each LDA model is estimated by in- sample data. The model selection process is then followed by using an em- pirical evaluation, e.g., RMSE, which is based on the out-of-sample data.
The BPNN model is widely used and produces successful learning and generalization results in various research areas. Usually, a BPNN can be trained by the historical data. The model parameters (connection weights and node biases) will be adjusted iteratively by a process of minimizing the forecasting errors. For classification purposes, the final computational form of the BPNN model can be written as
) ) (
sgn(
)
(x a0 qj 1wjf aj ip1wijxi t
z = +
∑
= +∑
= +ξ (7.14)where aj (j = 0, 1, 2, …, q) is a bias on the jth unit, wij (i = 1, 2, …, p; j
= 1, 2, …, q) is the connection weight between layers of the model, xi (i = 1, 2,…, p) are the input feature variables, f(•) is the transfer function of the hidden layer, p is the number of input nodes and q is the number of hidden nodes. Generally, input features variables are equal to the input nodes of the BPNN. By trial and error, we set the number of training epochs is 500 and heuristically determine the number of hidden nodes using the formula (2×nodein±1) where nodein represents the number of input nodes. The learning rate is 0.25 and the momentum factor is 0.30. The hidden nodes use sigmoid transfer function and the output node uses the linear transfer function.
For standard SVM model with all input features, the radial basis func- tion (RBF) is used as the kernel function of SVM. In standard SVM model