Chapter 4 Methodology
4. Summary
This chapter describes the methodology employed to investigate the use of genetic programming as a means of inducing novice procedural solution algorithms. The following chapter proposes a genetic programming system for the evolution of such problems. The results of applying this system to the problems listed in Appendix A are discussed in Chapter 6.
Chapter 5 -The Proposed Genetic Programming System
This chapter proposes a genetic programming system for the induction of solutions to novice procedural programming problems. [PILL01], [PILL02] and [PILL03a] describe the proposed system and some of the results obtained. The format of the programming problem specification, which forms the input to the genetic programming system for each problem, is defined in section 1. In order to facilitate the translation of the algorithms derived by the GP system into a programming language, the genetic programming system implemented in this study is strongly- typed. Details regarding this typing are presented in section 2. Section 3 describes the structure used to represent each individual. An account of the control models utilized by the system is provided in section 4. Section 5 discusses the process used to generate the initial population. In order to ensure the reuse of the expert module of the generic architecture proposed in Chapter 2, solution algorithms will be generated in an internal representation language. Section 6 describes the terminals and functions that will form the internal representation language. A fitness measure needs to be calculated for each individual. The fitness functions used for this purpose are examined in section 7. Section 8 discusses the selection method employed by the system. The genetic operators implemented by the system are described in section 9. It is evident from the discussion presented in section 4.2 of Chapter 3 that a genetic programming system may not find a solution if it converges prematurely. Mechanisms that have been built into the system to escape local optima are discussed in section 10. According to Koza et al. [KOZA99a] one of the preparatory steps that must be performed when implementing a GP system is the determination of major and minor system parameters to be used by the system. Section 11 specifies these parameters for the GP system presented. A brief summary of the chapter is provided in section 12.
1. Programming Problem Specification
For each problem, the input to the genetic programming system is a problem specification. Each problem specification contains the following information:
•
•
•
•
•
A description of the input to the problem - A variable is used to represent each input to the problem. The type and source of each variable must be specified. The source of the input can be the keyboard, a file, or memory (i.e. a variable).
A description of the output of the problem - A variable describing each problem output must also be provided. Both the type and the destination (i.e. the screen, a file, or memory) of the variable must be specified.
A set of fitness cases - Each fitness case provides a value (or list of values) for each input variable and the corresponding output values for each output variable defined.
Details regarding the application domain - In order to write a program for a specific application domain, a programmer must have a knowledge of that domain, e.g. to write a program that converts years to months the programmer must know that there are twelve months in a year. Such domain-specific knowledge needs to be made accessible to the genetic programming system. This knowledge takes the form of constant values in the program specification.
Screen output - This must be specified in the case of ASCII graphics problems. Each screen output specification contains a set of x and corresponding y values that correspond to positions on the screen, and the character found at each position.
• The function set that must be used. From the discussion presented in section 2.3.4 of Chapter 3 it is evident that extraneous functions result in degradation in the performance of a GP system and usually lead to the system not finding a solution. Thus, the function set will be a subset of the internal representation language. This subset will consists of those functions that a student should have knowledge of to solve the particular problem.
Table 5.1.1 tabulates the programming problem specification for the factorial problem while Table 5.1.2 defines the programming problem specification for the ASCII graphics problem which requires the right-angled triangle illustrated in Figure 5.1.1 to be displayed on the screen.
Input variables N
Types of input variable Integer Source of input variable Screen Input values for N 0, 1,2,3,4,5
Constants one: 1
Target values fact
Target type Integer
Target destination Screen
Target values for fact 1,1,2,6,24,120 Function set +, -, *, /, if, for, <, >,
<=, >=, = =, !=
Table 5.1.1: Problem specification for the factorial problem
* **
*** ****
Figure 5.1.1: ASCII Graphics Problem
x-coordinates 1,2,2,3,3,3,4,4,4,4 y-coordinates 1,1,2, 1,2,3,1,2,3,4 Character at each position
*
Constantis ch:
*
Constantis type Char
Function set: place, block2, block3, for
Table 5.1.2: Problem specification for an ASCII Graphics Problem
2. Strongly-Typed GP
According to Bruce [BRUC95] a genetic programming system should be strongly-typed in order to facilitate the direct translation of the algorithms generated by the genetic programming system into a programming language. The genetic programming system implemented in this study is required to generate solutions to novice procedural programming problems which will eventually be translated into a specific programming language and thus programs generated by the GP system must have a legal structure, e.g. the condition of an if-statement cannot be the sum of two numbers.
Strongly-typed GP systems implement structure-preserving genetic operators and methods of initial population generation. Thus, in order to ensure the evolution of structurally correct individuals, the system implemented in the study presented in this thesis is strongly-typed.
Each terminal, constant, memory location, operator and operator argument is of a specific type.
The types catered for by the system are Integer, Real, Boolean, Char, and String. The Integer type is defined as a sUbtype of the Real type. Thus, during the process of initial population generation a node of the type Integer can be inserted wherever a node of the type Real is needed. Similarly, the mutation operator can replace a chosen subtree of type Real with a newly created subtree of the type Real or Integer. In order to cater for the induction of solutions to ASCII graphic problems, the type Output is also defined. Any primitive of type Output does not return a value but updates the screen maintained by the system. Details regarding this screen output is presented in section 6.7.
The function set and terminal set are represented as a collection of subsets, one for each type.
When a node of a particular type is needed during tree creation, an element is randomly selected from the terminal or function subset corresponding to that type. When a tree is created, the root of the tree is chosen to be of the same type as the output that it must generate, e.g. if the system is required to generate an algorithm to calculate the factorial of a given positive integer, the root of all the trees will be of the type Integer.
Some operators and operator argument types are defined to be generic and are only instantiated during the initial population generation process. Haynes [HA YN98] and Banzhaf et al. [BANZ98]
describe the ad vantage of using generic types to be the elimination of the need to define an operator multiple times to perform the same task for different types (discussed in section 3.3 in
Chapter 3). For example, the type of an instance of the
if
operator and its second and third arguments are only instantiated once a subtree representing its second argument is generated.Similarly, the type oftheJoroperator is only determined when its third argument has been induced.
The procedures for initial popUlation creation, mutation and crossover have been implemented so as to facilitate typing. The following section describes the structure used to represent each individual.