REDUCTION TECHNIQUES FOR GENETIC PROGRAMMING

BACKGROUNDS

Genetic Programming

GP Algorithm
Representation of Candidate Solutions
Initialising the Population
Fitness Evaluation
GP Selection
Genetic Operators
GP parameters
GP benchmark problems

If the depth of the tree reaches the maximum depth, the child nodes are only selected in T. The selection of parents depends on the rank of each individual and not on skill.

Figure 1.1: GP syntax tree representing max(x + x, x + 3 ∗ y).

Some Variants of GP

Linear Genetic Programming
Cartesian Genetic Programming
Multiple Subpopulations GP

In the dissertation experiments, we selected a larger set of regression problems, which are comparative GP problems recommended in the literature [120] and regression problems taken from the UCI Machine Learning Repository [4]. The size of the effective code varies from 0 to the number of instructions in the LGP program.

Table 1.2: GP benchmark regression problems. Variable names are, in order, x, y, z, v and w

Semantics in GP

GP Semantics
Survey of semantic methods in GP
Semantics in selection and control of code bloat

AGX calculates the midpoint of their parents' semantics and randomly selects two crossover points in the parents. It uses the semantic backpropagation algorithm as AGX, but the desired semantics are the target semantics of the problem.

Figure 1.7: Running the program p on all fitness cases

Semantic Backpropagation

After that, a procedure is called to search a predefined library of trees for a new riT that is semantically closer to the desired semantics.

Statistical Hypothesis Test

The p-value is defined as the probability of obtaining a result that is equal to or more extreme than what was actually observed when the null hypothesis is true [100]. If the p-value is less than a threshold (called the critical value), the null hypothesis is rejected and the two sets of data are significant.

Conclusion

In particular, the size of the solutions of TS-S is always much smaller than that of GP on all problems. It is clear that the prediction ability of the SAT-based methods is significantly better than that of GP.

TOURNAMENT SELECTION USING

Introduction

Based on the Wilcoxon signed rank test, three variants of tournament selection are proposed to exploit semantic diversity and explore the potential of the approach to control program bloat. The performance of the selection strategies is studied on a large set of regression problems using the original problems and noise. The simplicity of the design of the proposed selection strategies allows for further improvements.

This chapter notes the addition of a more recent intersection operator to further improve performance.

Tournament Selection Strategies

Sampling strategies
Selecting strategies

The advantage of tournament selection is that it allows adjustment of the selection pressure by adjusting the tournament size. 32] analyzed the selection frequency of each individual and the probability of unselected and unselected individuals in tournament selection with different values of the tournament size. Xie and Zhang [124] proposed a method to automatically adjust the selection pressure during evolution based on the fitness rank distribution of the population.

In this chapter, the thesis introduces a new proposed method for selecting the winner in tournament selection, which is based on the statistical analysis of the semantics of GP programs.

Tournament Selection based on Semantics

Statistics Tournament Selection with Random
Statistics Tournament Selection with Size
Statistics Tournament Selection with Probability

The first proposed method is called Statistical Tournament Selection with Random and abbreviated as TS-R. The second suggested tournament selection is called Statistics Tournament Selection with Size and abbreviated as TS-S. In terms of algorithm complexity, the time complexity of statistical tournament selection is T(n) times the time complexity of standard tournament selection, where T(n) is the time complexity of the statistical hypothesis test.

The time complexity of a single tournament in standard tournament selection is O(k), where k is tournament size.

Experimental Settings

Symbolic Regression Problems
Parameter Settings

The elitism technique was also used in which the best individual in the current generation is always copied to the next generation. The results with tour-size=5 are presented in the appendix of the thesis and at https://github.com/chuthihuong/GP. In the tables, if the result of a method is significantly better than GP with standard tournament selection (shorthand as GP hereafter), this result is marked + at the end.

In addition, if it is the best (lowest) value, it is printed underline, and if the result of a method is better than GP, it is printed in bold.

Table 2.1: Problems for testing statistics tournament selection techniques

Results and Discussions

Performance Analysis of Statistics Tournament Selection 58

The test error of TS-RDO is the smallest in 13 and 15 problems with tour-size=3 and tour-size=7, respectively. TS-RDO is significantly better than GP in 17 problems with tournament size=3 and in 22 problems with tournament size=7 while these values of TS-S are only 10 and 11 respectively. The average size of solutions found by TS-RDO is the smallest in 12 and 14 problems respectively with tour-size=3 and tour-size=7.

This method is significantly better than GP in problems 11 and 13 with tour-size=3 and tour-size=7 while these values in noise-free data are only 10 and 11, respectively.

Table 2.3: Mean of best fitness with tour-size=3 (the left) and tour-size=7 (the right).

Conclusion

Code inflation is a phenomenon in Genetic Programming (GP) characterized by the increase in individual size during the evolutionary process without a corresponding improvement in fitness. Experimental results showed that our methods help significantly reduce code bloat and improve GP performance.

Introduction

Specifically, a technique to generate a new tree that is semantically approximate (similar to) the target semantics is introduced and used to reduce code bloat in some different strategies. A new proposed technique to reduce code bloat called Semantic Approximation Technique (SAT) is presented. In the next section, we review related work on managing code bloat in GP.

The semantic approximation technique and some strategies for reducing code bloat are presented in section 3.3.

Controlling GP Code Bloat

Constraining Individual Size
Adjusting Selection Techniques
Designing Genetic Operators

The individuals are compared to each other based on their fitness with probability p and based on their respective Pareto stratum with probability 1−p. OE determines a target histogram for the individuals, where the width of the bin determines the size of the programs belonging to the bin, and the height represents the number of individuals in the bin. The cut-off point is the size of the smallest individual that reaches a certain percentage of the best fitness so far.

While MORSM minimizes the semantic distance between the selected subtree and the subprograms, MODO minimizes the distance between the desired semantics of the selected subtree and the semantics of subprograms.

Methods

Semantic Approximation
Subtree Approximation
Desired Approximation

Last but not least, the size of newT ree can be limited by limiting the size of sT ree, and this will be used to design two approaches to reduce GP code bloat in the following subsections. At each generation, after applying the genetic operators to generate the next population, k% of the largest individuals in the population are selected. The next step selects k% of the largest individuals in the template population (P′i) and stores them in a pool.

For each individualI′ in the pool, a random subtree subT reeinI′ is selected using the function RandomSubtree(I′), and a small tree sT ree is randomly generated 1.

Figure 3.1: An example of Semantic Approximation

Experimental Settings

Second, newT ree is grown to approximate the desired semantics D of subT ree rather than the semantics S. By replacing subT ree with a newT ree that semantically approximates the desired semantics, the individual is predicted to be closer to the optimal value are. . Conversely, if it is significantly worse than GP, then this result will be marked - at the end.

Additionally, if the method result is better than GP, it is printed in bold, and if it is the best value, it is printed in underline.

Table 3.1: Evolutionary parameter values

Performance Analysis

Training Error
Generalization Ability
Solution Size
Computational Time

In contrast, the training error of SA and DA is often better than that of GP, PP, and TS-S. Similarly, SA20, SAD, DA20 and DAD are significantly better than GP in most functions tested. PP is also significantly better than GP on 7 problems, but GP is significantly better than it on 5 problems.

The mean duration of SA and DA is significantly shorter than that of GP in most of the problems tested.

Bloat, Overfitting and Complexity Analysis

Bloat Analysis
Overfitting Analysis
Function Complexity Analysis

Overall, the results in this section show that SA and DA improve the training error and testing error compared to GP and the latest bloat control methods (PP and TS-S). The reason for this may be that RDO only focuses on improving the training error but not the testing error. RDO is often smaller than GP (Table 3.5), the complexity of its solution is higher than GP.

The complexity of the solutions obtained by these methods is much lower than that of GP and RDO.

Figure 3.3: Average bloat over generations on four problems F1, F13, F17 and F25.

Comparing with Machine Learning Algorithms

The test error for the proposed models and four machine learning systems are presented in Table 3.8. It can be seen from Table 3.8 that although standard GP is often worse than four machine learning systems, the proposed methods are competitive with machine learning algorithms. On some problems such as F9 and F10, DA20 achieves much smaller test errors than RF and the other three machine learning techniques.

Overall, the results in this section show that our proposed methods often outperform three machine learning algorithms including LR, SVR, and DT and they are as good as the best machine learning algorithm (RF) in the generalization ability.

Applying semantic methods for time series forecasting

Some other versions
Time series prediction model and parameter settings
Results and Discussion

The table indicates that the training error of all tested GP systems normally decreases as the population increases, and the training error of the SAT-based methods, including SAT-GP, SAS-GP, SA, and DA (hereinafter SAT-based methods named). ) are usually better than GP's with population sizes of 250 and 500. However, when using the population size of 1000, the mean of best condition created from these methods tends to be worse than GP's at all settings of generation. The next measure is the impact of the proposed methods on reducing GP solution complexity and GP code bloat.

Conversely, the average size of the standard GP population is growing rapidly and is much larger than the others across all GP parameter settings.

Table 3.8: Comparison of the testing error of GP and machine learning systems. The best results are underlined.

Conclusion

C3] Chu, T.H., Nguyen, Q.U., O'Neill, M.: Semantic tour selection for genetic programming based on statistical analysis of error vectors. C5] Chu, T.H., Nguyen, Q.U., Cao, V.L.: Semantics-based substitution technique for reducing code bloat in genetic programming. 14] Chen, Q., Xue, B., Mei, Y., Zhang, M.: Geometric semantic intersection with an angle-aware pairing scheme in genetic programming for symbolic regression.

16] Chen, Q., Zhang, M., Xue, B.: Geometric semantic genetic programming with perpendicular crossover and random segment mutation for symbolic regression.