Binary classification - GP and Arithmetic Trees

4.3 GP and Arithmetic Trees

4.3.1 Binary classification

ones. The primary aspect to be extracted from the previous studies is that decision trees are easily interpreted, and consequently they are the favoured representation as researchers are easily able to understand the classification models evolved.

CHAPTER 4. GP AND DATA CLASSIFICATION 63

Figure 4.6: Illustrating how to map the output of a GP tree onto two classes using a threshold value. In this figure, the threshold is 0.5.

Additionally, in the study of Etemadi et al. the function set that was used was {+,−,×,ˆ, N OT, LT}. The NOT operator has an arity of one and is applied to an attribute. It returns the result obtained by subtracting the attribute from 1.

The LT operator has an arity of 2 and is applied to two variables. It returns a value of 1 if the first attribute is smaller than the second attribute, and if not, 0 is returned. The attributes and constants formed the terminal set. The data set was split with 72% of the instances in the training set, and 28% in the test set.

Crossover was applied with a probability of 0.6 and mutation with a probability of 0.06. No further details regarding other GP parameters were given. The proposed GP approach was compared to a multiple discriminant analysis model developed by the authors; GP obtained better accuracy.

Gray et al. [100] evolved GP trees in order to classify brain tumours. GP was tested on a single binary data set, and a threshold of 0 separated the output of the trees. For a given tree, a positive output represented the non-meningioma class, and a negative output represented the meningioma class. The function set used was{+,−,×, /, tan, myAN D, myOR, myN OT}, where the logical operators return either 0 or 1 based on their logical evaluation. The researchers point out that the tan function was not present in the best individual and that only three attributes were present. This indicates that GP is indirectly able to perform the task of feature selection and to make use of the most relevant attributes. The fitness proportionate selection method was used, and the population was generated using the ramped half and half method. The terminal set was composed of the attributes. Two experiments were performed on a single data set which is not publicly available. In the first experiment, the entire data set was used as the training set, while the second experiment used 87% of the data for training, and the remaining 13% for testing.

Seven hundred trees were evolved over 41 generations in the first experiment, and

200 were evolved over 20 generations in the second. There is no mention as to why the population size and the number of generations were varied. The standardised fitness was used as a measure of fitness, and was defined as the total number of instances minus the total number of correctly classified instances.

Bhowan et al. [101] evolved 500 GP trees over 50 generations in order to solve classification problems with unbalanced data sets. A threshold value of 0 was chosen to distinguish between the two classes. The function set used was {+,−,×, /, if}.

Attributes and random constants formed the terminal set. The if function takes three arguments which represent three branches in the node. The first argument is evaluated, and the second branch is followed if the argument is evaluated to a negative value, and the third branch is followed otherwise. Crossover, mutation and elitism [102] had application rates of 60%, 35% and 5% respectively. Tournament selection with a size of 7 was used. The proposed approach was tested on six publicly available data sets from the UCI repository, with 50% of the data used for testing.

Hennessy et al. [103] investigate the use of GP to evolve 2000 trees over 50 generations for a binary classification task of determining whether or not a solvent is present in a mixture of solvents in Raman spectra. The data set used had 1024 attributes and only 24 instances. The training set contained 58% of the instances of data, and the remaining 24% was used for testing. The function set{+,−}was used.

Hennessy et al. state that other functions could have been used; however the two selected functions were sufficient in order to achieve high accuracy. The attributes formed the terminal set. A threshold value of 0 was applied in order to map the output of an individual to either presence or absence of a solvent. The fitness function chosen takes two aspects into consideration. The first is the classification accuracy, and if all the training instances are correctly classified, the second aspect is examined. This second aspect is a measure of certainty, and is determined by finding the minimum absolute value of the output on all the training instances for a given individual. The authors do not compare the performance of the proposed approach without the measure of certainty, thus it is unclear as to whether or not the measure of certainty impacted the overall performance of the GP algorithm.

The work by Li and Ciesielski [104] shows that by modifying the function and terminal set, GP can be applied to different problem domains. In this study, GP is used to evolve 100 trees over 2000 generations in order to distinguish between squares and circles within an image classification context. This dissertation does not deal with image processing. Image classification is a data classification problem in the sense that each pixel in the image can be converted into numerical data. Li and Ciesielski investigate the use of loops. A loop takes three arguments, where the first two correspond to positions within an image, and the third corresponds to a function which will be applied to the data between the two positions. Two loop func-

CHAPTER 4. GP AND DATA CLASSIFICATION 65 tions were researched, namely thePlusMethod and theMinusMethod function. For example, if the loop is executed with positions 10 and 15 on a particular image using thePlusMethod function, then the result will be a numerical value representing the sum of pixels between those two positions. The function set was{+,−, F orLoop}, where + and − denote mathematical addition and subtraction. The terminal set used was {RandDouble, RandP osition, P lusM ethod, M inusM ethod} whereRand- Doublegenerates a random double between 0 and 100, andRandPositiongenerations a random integer between 0 and 255. Mutation had an application rate of 28%, with crossover and elitism having rates of 70% and 2% respectively. The initial population was generated using ramped half and half with a maximum tree depth of 7.

The fitness proportionate selection method was used. A threshold value of 0 was applied, whereby a positive output from a GP individual represented a square, and a negative output represented a circle. The images did not come from a benchmark data set, but were instead generated.

GP has been applied to the classification task of determining if two proteins interact or not in [105]. Garcia et al. evolved 1000 trees over 50 generations. The maximum tree depth was set to 17 and tournament selection of size 7 was used.

Accuracy was used as the fitness function. Crossover, mutation, and reproduction had an application rates of 50%, 40% and 10% respectively. Preliminary runs were performed in order to optimise the parameters. A threshold value of 0.5 was applied when comparing two proteins using the following formulation:

if(function) ≥0.5 then the two proteins interact functionally, else the proteins do not interact

whereby the parameter “function” defined in the if statement represents a GP tree. The function set was {+,−,×, /,≥}, and the terminal set was composed of the the attributes, along with an ephemeral random constant (ranging from [0,1]).

The data set was split evenly with 50% of the data used for training and 50% for testing.

Agnelli et al. [106] use GP to evolve 5000 trees over 50 steady state generations in order to classify segments from 102 scanned documents. Segments of images and text were extracted from these documents resulting in a total of 821 instances of data. The aim was to distinguish between textual and a graphic segments. The trees were trained to allocate a positive value for image segments and a negative value for text segments, thus the threshold was set to zero. The GP approach was tested on a single data set and obtained high classification accuracy. The initial population was generated using the ramped half and half method, with a maximum tree depth of 17. Tournament selection with a size of 7 was used. Crossover had an application

rate of 90%, and mutation 10%. The function set was {+,−,×, /,2^x, if}, unary minus, and an ephemeral random constant in the range of [0,11]. The attributes formed the terminal set.

A population of 4000 GP trees were evolved over 100 generations by Topon and Iba [107]. In this study GP was applied to a binary classification problem in order to distinguish between systemic sclerosis and normal biopsies. Arithmetic trees were evolved using the function set {+,−,×, /,ˆ,√}. The attributes formed the terminal set. Each individual represented a rule in the form of if (expression

≥0)then systemic sclerosis, else normal. The initial population was created using the ramped half and half method, with a maximum depth of 7. The probability of applying crossover was 0.9, mutation 0.1, and reproduction 0.1. The fitness function took into consideration the correlation between the tree’s output and the correct output. Greedy over-selection [3] was used when selecting parents for the crossover operator. When several individuals are considered as a parent using greedy over- selections, the fittest individuals have a greater chance of being selected over the other individuals. The algorithm was tested on a single data set having 27 instances;

81% of the instances were used for training and 19% for testing.

Arcanjo et al. [108] evolved 100 GP trees over 50 generations. In this proposed method a sigmoid function maps the output from a tree onto a range of (0,1). A threshold value of 0.5 was chosen so as to discriminate between the two classes by determining if the result of the sigmoid function is less or greater than the threshold.

The threshold value was determined through experimentation. The function set was {+,−,×, /}. The attributes and an ephemeral random constant (ranging between -9 to 9) were used as the terminal set. The initial population was created using the full and grow method with a maximum depth of 5. Tournament selection with a size of 3 was used. Crossover had an application rate of 85%, and mutation 5%;

elitism was also used. The GP approach was tested on 8 data sets taken from the UCI repository.

Zhang and Wong [109] investigate the use of online simplification. Simplification is used in order to reduce the complexity of the classifier by reducing the number of nodes. This allows the classifier to be interpreted more easily and therefore allows for faster processing. It can be applied after or during the evolutionary process. The simplification process was achieved through simplification rules which were defined prior to the evolutionary process. An example of such a rule is to reduce “a – 0”

to “a”. Each tree is travered recursively with the simplification being applied to each node. Further investigation included the frequency at which the simplification should be applied. The function set used was {+,−,×, /, if}, and the attributes along with ephemeral random constants formed the terminal set. Five hundred trees were evolved over 50 generations using GP. The initial population was created using

CHAPTER 4. GP AND DATA CLASSIFICATION 67 the ramped half and half method, with a maximum tree depth of 6. Crossover had an application rate of 60%, mutation 30%, and reproduction 10%. The fitness proportionate selection method was used. Accuracy was used as the fitness function.

Two data sets from the UCI repository were used; namely, WDBC and spectf. The 10-fold cross-validation was applied to validate the classifiers. Due to the random nature of GP, a total of 50 independent runs were performed. The proposed method was compared to a GP approach without simplification, neural networks, Na¨ıve Bayes, decision trees, nearest neighbour, and the nearest centroid classifier. The results show that the proposed GP method with online simplification obtained a higher classification accuracy than the other methods. Furthermore, the proposed method showed a reduction in the total number of nodes. Finally, the researchers point out that simplification should not be applied at every generation, but with intervals ranging from 2-5 generations.

Several methods for creating threshold values for binary classification were ex- plored by Fitzgeraldet al. [110]. The traditional threshold approach was compared to eight other proposed threshold approaches. The proposed methods allow GP to decide upon the threshold value instead of setting a fixed threshold prior to the evolutionary process. Amongst the eight methods, the Optimised Individual Class Boundaries (OICB) performed well in terms of achieving a high accuracy. OICB uses a boundary search algorithm which attempts to find the best boundary by par- titioning the output values and exploring different threshold values until the most suitable ones are found. Each individual can choose its polarity based on its misclassification error. For the following explanation, assume that there are two classes, positive and negative, for a binary data set.

In binary decomposition, tress output a value greater or smaller than the threshold, and this output is then mapped to a class. In a typical situation, this mapping is determined in advance. Fitzgeraldet al. define this as the polarity. Typically when a threshold of zero is used, a tree that outputs a negative value will have its output correspond to the negative class, and a tree which outputs a positive value will correspond to the positive class. This is defined in advance and remains unchanged.

In terms of the tree output, the polarity is defined in accordance to whether the instances of the positive class are above or below the threshold. A negative polarity is where instances from the negative class are situated below the boundary value. Fitzgerald et al. proposed OICB+, where, in this approach, an individual can alter its polarity so as to obtain a smaller misclassification error. The proposed methods used steady state GP with a population size of 500 trees evolved over 60 generations. The function set was given by {+,−,×, /}, and the attributes formed the terminal set. Crossover had an application rate of 80%, and mutation 20%.

Tournament selection with a size of 5 was used. The initial population was created

using the ramped half and half method. The traditional static threshold approach obtained the highest training accuracy on four out of the six data sets from the UCI repository. However, on the test set, the traditional threshold approach was outperformed by the eight proposed boundary methods. OICB+ obtained statistically significant results that outperformed the traditional threshold approach. OICB+

offers additional flexibility to the overall algorithm in comparison to the threshold approach and stands out as a novel approach for using arithmetic trees for binary classification.

Dalam dokumen Data classification using genetic programming. (Halaman 83-89)