Pseudocode for SSGA-GP - Popup message which appears at the end of the run

A.3 Popup message which appears at the end of the run

9.7 Pseudocode for SSGA-GP

input:ensemble size

input:num gen: total number of GP generations to perform before applying the modified mutation operator

input:num offspring: total number of offspring to create using modified mutation operator

input:num replace: total number of chromosomes to replace using inverse tournament selection

output: An ensemble represented by a GA chromosome

1 begin

2 Create the initial GP population.

3 Initialise the GA chromosomes (with size = ensemble size) by randomly selecting trees from the initial GP population.

4 Performnum gen number of GP generations.

5 Perform the modified mutation GA operator on the GA population and create num offspringoffspring.

6 Perform the inverse tournament selection and repalce num replace chromosomes in the GA population with the offspring created in step 5.

7 Evaluate all the GA chromosomes and store the chromosome with the highest trainning accuracy in memory.

8 Repeat steps 4 to 7 until the maximum number of GP generations has been met.

9 Output the chromosome which obtained the highest trainning accuracy.

10 end

After each GP generation, the modified mutation operator is executed. This operator generates offspring chromosomes which may contain GP trees from the current GP population, and then the offspring replace the weaker chromosomes within the SSGA population. The number of offspring to replace is a user defined parameter.

In terms of GOs, the crossover and mutation operators are responsible for optimizing the GP population of GP classifiers, whereas the modified mutation GA operator is responsible for optimizing the SSGA population of GA chromosome ensembles. SSGA-GP does not make use of GA crossover or the conventional GA mutation; only the modified mutation operator is applied in order to investigate the effectiveness of this operator.

Trial runs were performed, and it was determined that forSSGA-GP the optimal chromosome ensemble size is 7. Thus,SSGA-GPinitialises the chromosomes to a size of 7 and the modified mutation operator replaces a single gene within a chromosome,

which consequently ensures that the chromosomes remain with a size of 7.

CHAPTER 9. HYBRIDISING EVOLUTIONARY ALGORITHMS 151

Parameter Value

GP Population size 700

GP Parent Selection Method Tournament selection of size 7 GP Initial Population

Maximum Tree Size

7 GP Initial Population

Generation Method

Ramped half and half

Maximum GP Offspring Size 7

GP Crossover Rate 70%

GP Mutation Rate 30%

Maximum Number of GP Generations

700

GP Model Generational model

GP Function Set Attributes

GP Terminal Set Classes

GA Population size 1000

GA Parent Selection Method GA-at-end,GA-after-each and GA-with-HC: Tournament selection of size 7

SSGA-GP: Inverse tournament selection of size 7

GA Initial Population Generation Method

Randomly select GP trees based on hybrid method

GA Recombination Rate GA-at-end,GA-after-each and GA-with-HC: 50%

SSGA-GP: 0%

GA Mutation Rate GA-at-end,GA-after-each and GA-with-HC: 30%

SSGA-GP: 100% (Modified mutation) Number of individuals to

replace from the SSGA population in SSGA-GP

Maximum Number of GA Generations

GA-at-end: 200

GA-after-each and GA-with-HC: 20 Table 9.2: GP and GA parameters for the hybridisation experiments.

9.4 Conclusion

This chapter proposes four methods which hybridise GA and GP in order to evolve a population of ensembles. The first method executes the GA at the end of the GP run.

The second executes the GA after each GP generation, the third proposed method is an extension of the second method and incorporates hill climbing to the GA recombination operator. The last proposed method investigates the hybridisation of steady state GA model and GP. The proposed methods will be tested on 12 publicly available data sets.

Chapter 10 Ensemble Construction for Data Classification using Genetic

Programming

10.1 Introduction

This chapter presents an ensemble construction method which creates a single GP ensemble. This approach differs to the approach described in chapter 9 since in this algorithm GP only creates a single ensemble as opposed to evolving a population of ensembles.

The ensemble construction method is introduced in section 10.2. Details on how a GP tree is selected to be added into the ensemble is discussed in section 10.2.1.

A description of how the ensemble is evaluated is presented in sections 10.2.2 and 10.2.3. Each instance is allocated a weight, section 10.2.4 describes how the weights are updated. The experimental setup is presented in section 10.3. Finally, section 10.4 concludes this chapter.

10.2 Proposed Ensemble Construction

This section describes the proposed ensemble construction method. The ensemble is a list of GP classifier trees which vote in order to classify instances of data within a data set. This proposed ensemble construction deals with creating one ensemble during one GP run. A tree is added to the ensemble after a certain number of GP generations. At the end of the GP run, the ensemble is output and evaluated on the test set. This section describes how the ensemble is represented, and how a tree is added to the ensemble. Furthermore, this section describes how weights are used

152

CHAPTER 10. GP ENSEMBLE CONSTRUCTION 153

Figure 10.1: Ensemble with corresponding trees at each index.

to train the GP individuals, and how these weights are updated. Additionally, this section provides a discussion on how the GP trees and the ensemble are evaluated.

Figure 10.1 illustrates an example of an ensemble where the ensemble size is three.

At each index the corresponding GP tree is illustrated. Each tree represents a classifier. In this study two different GP tree representations are used on different data sets. Arithmetic trees are used when the data set contains numerical attributes, and decision trees are used when the data set contains nominal text and discrete integer values.

10.2.1 Selecting a tree to add to the ensemble

Initially, the ensemble is empty and trees are added to the ensemble after a certain number of generations. A user defined parameter, addFrequency, determines after how many GP generations a new tree is added to the ensemble. The pseudocode for adding a tree to the ensemble is illustrated in algorithm 10.1. Before selecting a tree to add to the ensemble, the current fitness of the ensemble is computed as this current fitness will be compared to the fitness of the ensemble after a new tree is added.

When a tree is to be added to the ensemble the tournament selection method is performed on the current GP population. The tree which is selected as a result of tournament selection is then added to the ensemble and the new fitness of the ensemble is computed. This fitness is then compared to the ensemble’s previous fitness. In the case where the ensemble was previously empty, the tree which results in the highest ensemble accuracy is simply added to the ensemble.

During each iteration of this algorithm, a single candidate tree is temporarily added to the ensemble in order to compute the new fitness. If the new fitness is greater than the current fitness, then a reference to the candidate tree is stored as

a best candidate. The algorithm is iterated 20 times, and if a best candidate is found, then that candidate is permanently added to the ensemble. If there is no best candidate tree then the original ensemble is returned. The weights are then updated regardless of whether or not a tree has been added into the ensemble. This is further discussed in section 10.2.3.

Dalam dokumen Data classification using genetic programming. (Halaman 170-175)