Named Memory Operators - String Manipulation Operators

Chapter 4 Methodology

6. Primitives

6.2 String Manipulation Operators

6.4.1 Named Memory Operators

Named memory is the equivalent representation of a variable in an imperative program. The operators, write and change are available to access a named memory location. When the write operator is added as a node to a parse tree during the process of initial population generation, a memory location of the specified type, i.e. Boolean, Char, Integer, Real or String, is instantiated in the individual's memory structure. A variable name corresponding to the location is added to the terminal set of the individual currently being created.

The write operator takes a single argument. During the execution of the parse tree, the result of evaluating the argument of the write operator is written to the memory location referenced by the variable linked to the particular instance of the write operator. The write operator also returns this value. Figure 5.6.4.1.1 illustrates an individual containing an instance of the write operator and its corresponding memory structure. The type of an instance of the write operator is only instantiated during the process of initial population generation when the subtree representing its argument is generated, i.e. the write operator is generic.

+ Named memory

! ~

Figure 5.6.4.1.1: Using the write operator. Assume that x= 1 and y= 1.

The change operator takes two arguments, namely, a first argument of any type, and a second argument which is a variable name of an allocated memory location. The change operator is only added as a node to a parse tree if memory locations of the same type as its first argument already exist, i.e. the individual in question already contains a write node. During execution of the parse tree the value stored at the memory location specified by the second argument of the change operator is replaced by the result of evaluating its first operator.

The change operator also returns the original value stored. The type of an instance of the change operator is only instantiated during the process of initial population and is the same as that of its first argument. Figure 5.6.4.1.1 illustrates the effect of adding a branch containing an instance of the change operator to the tree in Figure 5.6.4.1.2. Note that in this case the value returned by the change operator will be 2.

varO#

x Named memory

y y

I VarO#l l

I

Figure 5.6.4.1.2: Using the change operator

Checks are performed during crossover and mutation to ensure that the second child of the change operator is always a variable.

6.4.2 Indexed Memory Operators

The aread operator is used to access indexed memory. A read takes one argument, an integer index to the indexed memory and returns the value stored at this index. In order to ensure that the closure property is met by the function a value of one is returned if the memory location referenced by aread operator is not accessible. If an input variable in the program specification requires the use of indexed memory the function alen is added to the terminal set. Alen evaluates to the number of elements in the input list. During the evaluation process, prior to the execution of a parse tree on a particular fitness case, the indexed memory of the individual is initialized to store the values of the corresponding input variable specified in the fitness case.

6.5 Conditional Control Structures

Three conditional control operators, namely, if, switchi, and swtichc are provided by the genetic programming system.

The

if

operator performs the function of an if-else statement and thus takes three arguments. The first argument represents a condition and is of the type Boolean. The second and third arguments can be of any type. The type of these arguments are instantiated during the process of creating the initial population. Both these arguments will be instantiated to the same type. If the first argument of the ifoperator evaluates to true the result of evaluating its second argument is returned otherwise the result of evaluating its third argument is returned. Figure 5.6.5.1 illustrates a parse tree that uses an instance of the

if

operator.

y x

Figure 5.6.5.1: Parse tree that calculates the maximum of two numbers

The switchi and switchc operators perform the same function as a switch statement for integer and character values respectively. These operators perform the same function as the CASE statement proposed by Pringle [PRING95] (section 5 of Chapter 3).

The first argument of the switchi operator represents the value to be tested against n options and is of type integer. This argument cannot be a constant. The next n arguments represent these options. Each option is represented as list of integer values with the comb connector combining the list. The arity of the comb connector is randomly selected in the range of 1 to the maximum value specified. In a switchi operator instance the comb operator is an integer while in a switchc operator instance it is a character. Figure 5.6.5.2 illustrates an example subtree.

Figure 5.6.5.2: Represents a list of options in a switchi instance

The number of options, n, tested by a particular instance of this operator is chosen randomly, where 3<= n <=7. It was felt that an upper bound of 7 will be sufficient for novice procedural programming problems. The GP system also randomly chooses whether a particular instance of the switchi or switchc operator should have a default option. Hence, the arity of each switchi operator will be n*2+ 1 or n*2+2 (if a default option exists).

The last n (or n+ 1) arguments of the switchi operator represent the corresponding actions that will be performed if at least one value in the corresponding option list is equal to the first argument.

The types of these arguments are instantiated during the initial population generation process.

These arguments are all of the same type. The switchc operator performs the same function as that performed by the switchi operator for character values.

In order to ensure that the closure property is met, if none of the cases are satisfied and a default option does not exist in a particular instance, the switchi operator returns a one and the switchc operator returns the empty character. If a value in more than one option list is equal to the first argument of the switchi or switchc operator an error message is returned. Figure 5.6.5.3 displays an instance of the switchc operator.

y

Figure 5.6.5.3: An instance of the switchc operator

The maximum arity of the switchi and switchc operators, denoted by switch_max, and the maximum arity of the comb connector, comb_max, are problem dependant and thus added to the set of GP parameters.

If one of the subtrees chosen during crossover has the comb connector as the root and the other does not, the operation is abandoned and new crossover points are chosen. Similarly, if the subtree chosen to be replaced during mutation has the comb connector as a root, the newly generated subtree also has a comb connector, possibly of a different arity than that being replaced, as a root.

6.6 Iterative Control Structures

Section 3.1.3 of Chapter 3 provides an account of the iterative structures that have been used in genetic programming systems. This section describes the iterative operators that form part of the internal representation language. In order to facilitate the translation of evolved algorithms to a procedural programming language the syntax and functioning of these operators have been chosen to be similar to that of iterative operators commonly used by procedural programming languages.

The iterative control structures provided by the genetic programming system are the for, while, and dowhile operators which implement the for, while and dowhile loops respectively.

The for operator takes three arguments. The first argument represents the initial loop value and the second argument represents the final loop value. Thus, both values are of the type Integer. The third argument can be of any type and is evaluated on each iteration of the loop. The type of this argument, and hence the particular instance of the for operator, is instantiated during the process of initial population generation. The number of iterations performed by eachfor operator instance is equal to the difference between its first and second argument plus one.

The while and dowhile operators take two arguments, the first of which represents a condition and is thus of the type Boolean. The second argument represents the parse tree which will be executed on each iteration. The while operator only performs an iteration if its first argument evaluates to true.

The dowhile operator performs one iteration after which an iteration is only performed if its first argument evaluates to true.

Two variables are maintained for each of the iterative control structures. The first is a counter variable which is incremented on each iteration. This variable serves the same purpose as the INDEX terminal used by Koza et al. [KOZA99a] in their implementation of automatically defined iterations and loops. In the case of the for operator this variable can be incremented or decremented and is given an initial value equal to the first argument of the for operator. In the case of the while and dowhile operators the counter variable is initialized to one. The second variable maintained by the system stores the result of each iteration and will be referred to as the iteration variable. The initial value given to this variable is context-sensitive and is denoted by the

"@" character. For example, suppose this variable is represented by lvar and the body of the loop

is lvar + N then lvar will be given an initial value of zero. However, if the body of the loop was lvar

*

N, lvar will be given an initial value of one. Thus, the initial value given to this variable is dependent on the operator applied to it. When the algorithm generated by the GP system is converted into a particular programming language, a programming statement initializing this variable will be placed before the iterative structure. The counter and iteration variables are automatically updated on each iteration of a loop instance. If the type of the loop instance is Output, an iteration variable is not maintained for the instance.

When an instance of the for operator is instantiated, both these variables are added to the terminal set when creating the third argument of the for operator. Similarly, when the while or dowhile operator is added as a node to a parse tree the counter variable is added to the terminal set when creating the parse tree representing the first argument and both variables are added to the terminal set when generating the second argument of the while or dowhile operators. If the for operator exists in the function list in a problem specification, the constant one is added to the terminal set.

One of the difficulties associated with incorporating the use of iteration into a genetic programming system is the problem of overcoming the halting problem [BANZ98]. The literature proposes a number of methods of dealing with this problem. These methods are described in section 3.1.2 of Chapter 3. For the purposes of the study presented in this thesis, a similar approach to that taken by Koza [KOZA92] has been employed to prevent infinite and time-consuming iterations. An upper bound on the number of iterations, max_it, that can be performed by each individual is set.

This value is problem dependant and treated as a GP parameter.

In some instances the loop represented by the for operator and the while operator may not be executed. In order to satisfy the closure property of the function set it is essential that a value is still returned by the for and while operators in these instances. The GP system returns an error message if the loop is not executed. If either of the first two arguments passed to an instance of the for operator is the error message an error message is returned. Similarly, if either of these values is Infinity or -Infinity an error message is also returned.

Figure 5.6.6.1 illustrates the use of an instance of the for operator in an algorithm that calculates the factorial of an integer value N. Figure 5.6.6.2 and Figure 5.6.6.3 depict the use of the while and dowhile operators respectively in an algorithm which returns the sum of numbers from one to the given integer input N. The 0 in countO#l specifies the position of the individual in the population, in this case the first individual, and the 1 specifies the counter variable number. The notation used in the iteration variable IvarO#l is defined in the same way.

Figure 5.6.6.1: The use of the for operator in calculating the factorial of the integer N

N

Figure 5.6.6.2: An instance of the while operator

6.7 Input and Output Operators

N

Figure 5.6.6.3: An instance of the dowhile operator

From the survey of introductory programming courses conducted, it is evident that novice procedural programming problems have one or more inputs and one or more outputs. Thus, each algorithm derived by the genetic programming system will have one or more inputs and one or more outputs. These inputs and outputs are defined as part of the program specification. The code required to obtain these input and output values are fairly standard. Thus, it was decided that the genetic programming system will not be required to generate the code needed to obtain the input values and output the values computed by the algorithm. These standard code components, e.g.

reading inputs values from a file, will be added to the algorithm derived by GP system. A further advantage of this approach is a reduction in the computational effort involved in inducing the algorithms using genetic programming.

In order to facilitate this approach the definition of the input and output variables in the problem specification must include a description of the source/s of the input variables and the destinationls of the outputs of the algorithm. Valid sources and destinations include the keyboard/screen, a file, a function and memory.

Ascn graphic problems are usually used to teach novices simple output and iteration ( [DEITOI]).

These problems require characters to be written to specific positions on the screen.

To cater for this category of problems the genetic programming system implemented uses a multidimensional array to represent the screen. The system keeps track of two screen variables, currentx and currenty. These variables store the x and y-coordinates of the current screen position, i.e. the position that will be written to next. These variables are updated each time the screen is written to.

The programming problem specification for an ASCn graphics problem must be defined as a set of x- and y-coordinate pairs and the character that must appear at the intersection of both coordinates, for each pair, on the screen. The dimensions of the multidimensional array representing the screen correspond to the maximum x- coordinate and y-coordinate specified in the fitness cases. The "-" character is used to represent a blank screen position. Two constants are added to the terminal set, namely, the maximum value in the x-coordinate and y-coordinate ranges listed in the problem specification. If the terminal set does not contain a constant value of one, one is also added to the terminal set.

The place operator is included in the function set to display characters at certain positions on the screen. The place operator takes three arguments: a x-coordinate value, a y-coordinate value and the character to be written to the screen. The first two arguments are of the type Integer while the third argument can be of any type. The type of the third argument of the place operator is randomly chosen from the existing types during the process of initial population generation. The type of the place operator is Output. Executing an instance of the place operator results in its third argument being written to the point ( i.e array position) specified by its first and second arguments.

Both the screen variables, currentx and currenty are updated accordingly.

If either the x- coordinate value or the y- coordinate is less than one, the invalid coordinate is replaced by a one and the error character, represented by"!", is written to the corresponding position on the screen. Similarly, ifthe x-coordinate or y-coordinate is greater than the maximum value permitted it is replaced by the maximum value and the error character is written to this screen position. Furthermore, if the place operator attempts to write a character to a screen position that has already been written to, the error character is written to this position. The number of times a screen location has been written to is also recorded. If the error message is passed to the place operator as either a first or second operand the operand is replaced with maximum value permitted and the error character is written to the corresponding screen position.

In order to cater for those languages that may not contain a command that writes output to specific locations on the screen, the toscreen and newline operators are defined as part of the internal representation language. Both these operators are of type Output. The toscreen operator takes a single argument which it writes to the screen. This argument can be of any type and is instantiated during initial population generation. The character is written to the screen position specified by currentx and currenty. Both the screen variables are updated accordingly: if currentx or currenty is greater than the maximum x-coordinate or y-coordinate respectively, the variable is decreased by one and the error character is written to the screen. Execution of the newline operator results in currentx being incremented and currenty being set to zero. Section 7 discusses how the fitness of an individual is calculated for these problems.

Table 5.6.7.1 displays the problem specification for an ASCn graphics problem and Figure 5.6.7.1 illustrates an example of an individual using the place operator and the corresponding screen representation maintained by the GP system.

x-coordinates 1,2,2,3,3,3,4,4,4,4

y-coordinates 1,1,2,1,2,3,1,2,3,4

Character at each position

*

Constant

*

Constant type Char

Function set block3, block2, place, for Table 5.6.7.1: Programming Problem Specification for an ASCII Graphics Problem

countO#l ch

Screen representation

o

1 2 3

o *

1 ^f---l--^*-f---+----j

*

~~----~--~-4

Figure 5.6.7.1: An individual containing the place operator and the corresponding screen representation maintained by the GP system 6.8 Recursive Operators

Section 3.1.4 of Chapter 3 describes studies that have been conducted to evolve recursive algorithms using genetic programming. The approach taken by Brave [BRA V96] is similar to that commonly employed by imperative programming languages. The system developed by Brave [BRA V96] induces recursive algorithms by allowing a parse tree representing an ADF to call itself.

Novice recursive programming problems usually require the student to write a recursive method or function. The recur operator will be used to represent a function call made by a parse tree to itself.

The recur operator allows a parse tree to call itself recursively. The arity of this operator is equal to the number of input variables for the problem. Each argument represents the corresponding input variable when a recursive call is made to the algorithm represented by the parse tree. The type of each argument of the recur operator is chosen to be the same as that of the input variable the argument represents. The type of the recur operator is set to be the same as that of the root node of the parse tree in which it appears. Thus, the type of each instance of this operator is instantiated during the process of initial population generation. The recur operator is prohibited from being inserted at the root of the tree.

Dalam dokumen An investigation into the use of genetic programming for the induction of novice procedural programming solution algorithms in intelligent programming tutors. (Halaman 129-140)