A study of genetic programming and grammatical evolution for automatic object-oriented programming.

The experimental work described in this thesis was carried out in the School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, from February 2014 to April 2016, under the supervision of Professor Nelishia Pillay. Where the work of others has been used, this is duly acknowledged in the text.

PLAGIARISM

PUBLICATIONS

The third objective is to compare the performance of GP and GE for automatic object-oriented programming. Object-oriented genetic programming (OOGP), a variation of OOGP, namely Greedy OOGP (GOOGP), and GE approaches to automatic object-oriented programming were implemented. The results show that both GP and GE can be used for automatic object-oriented programming.

INTRODUCTION

Purpose of the Study
Objectives
Contributions
Dissertation Layout

Genetic programming will be developed and evaluated to produce code for the classes and programs that use the classes produced. This is the first study investigating the use of GE for object-oriented automatic programming. The performance of genetic programming and grammar evolution were compared for object-oriented automatic programming.

GENETIC PROGRAMMING

Introduction
Introduction to Genetic Programming Algorithm
Control Model

Generational Control Model
Steady state control Model

Program Representation
Function and Terminal Set

Function Set
Terminal Set

Sufficiency and Closure
Initial Population Generation

The Full Method
The Grow Method
Ramped Half-and-Half

Evaluation

Fitness Cases
Fitness function

Selection Methods

Tournament Selection
Fitness-Proportionate Selection

Genetic Operators

Reproduction
Crossover
Mutation

Termination Criteria
Advancements in Genetic Programming

Strongly Typed GP
The Use of Memory
Iteration
Modularization

Bloat in GP
Strengths and Weaknesses of GP

Strengths
Weakness

Setting up a Genetic Programming System
Chapter Summary

It was found that grammatical evolution scales better than genetic programming when developing code for classes.

GRAMMATICAL EVOLUTION

Introduction

Grammatical evolution uses the genotype-phenotype distinction and the mapping of genotype space to phenotype space to produce a program. The genotype must be expressed as a program; the mapping process is detailed in Section 3.6, while wrapping is discussed in Section 3.7. Each evolutionary algorithm has its advantages and disadvantages, Section 3.14 highlights the advantages and disadvantages of grammatical evolution.

Introduction to Grammatical Evolution

A Grammar in Grammatical Evolution

The number of production rules, say 𝑝𝑛, for each of the non-terminals is 3, respectively. By structuring the grammar so that some functions are not allowed to take certain terminals as arguments, the grammar can be used to obtain a better solution to the problem at hand. Production rules can be structured in a way that facilitates combinations of terminals in prefix or postfix notation.

Genotype Representation

A small change in genotype that results in a small change in phenotype is known as high locality. Conversely, if a small change in genotype results in a large change in phenotype, the algorithm is said to have low locality and, as such, is likely to perform a random search. An advantage of integer representation is that it eliminates the time spent converting codons from binary to integer.

Initial Population Generation

Mapping from the Genotype to the Phenotype

To illustrate the mapping process, the chromosome shown in Figure 3.4 is mapped with the BNF grammar defined in Section 3.4.1 as follows. The first codon value is 234 and the number of production rules for the start symbol is 3. Thus, variety in genotypic space is not associated with the same variety in phenotypic space.

Wrapping

As in natural evolution, GE exhibits a many-to-one mapping, ie. more than one genotype can be mapped to the same program. One of the disadvantages of this is that the population may contain many individuals with the worst possible fitness as a result of incomplete mapping.

Evaluation

Selection

Genetic Operators

Crossover
Mutation

The result is 0, which is the number of the production rule stored at the first index of H1. The homologous crossover uses more memory and increases run time because it requires the history of the selected rules to be saved. In general, a crossover operator aims to search near the current solution in hopes of a better solution.

The operator seeks to increase the diversity of the population by taking the search to a new area of the search space6. Nodal mutation changes a single node in an inference tree while structural mutation changes the structure of the inference tree. While nodal mutation searches more of the neighborhood of the candidate solution, structural mutation searches new areas of the search space.

This could be achieved by specifying the percentage of the mutation operator that should result in nodal or structural mutation. Castle and Johnson [45] investigated the effect of the mutation operator in terms of the mutation point. While the crossover operator aims to exploit the neighbor solution to the candidate solution, the mutation operator aims to explore the search space.

Striking a balance between the probabilities of both the mutation and crossover operators will balance the exploration and exploitation capabilities of the search algorithm.

Figure 3.6. A one point variable length crossover

Termination Criteria

Bloat in GE

Modularization in GE

For a symbolic regression problem, an example production rule for the starting symbol is < 𝑒𝑥𝑝𝑟 >. The non-terminal < 𝑐𝑜𝑑𝑒 > is expanded to get the main program, while the non-terminal < 𝑎𝑑𝑓𝑐𝑜𝑑𝑒 > is expanded to get the ADF. While the number of ADFs a program must contain is preset in Ryan [47], Harper and Blair [46] use production rules to determine the number of DDFs a program must contain.

If the latter is the case, the expansion ends otherwise the result is further expanded to determine the number of DDFs to include and the number of parameters each of the DDFs should take. While Harper and Blair [46] use one chromosome during the mapping process to obtain a program that has a main program and one or more DDFs, Hemberg et al. The first is used to expand the terminals that determine the number of ADFs a program must have and the number of parameters each of the ADFs must take.

The second is used to generate the following: the main program, the body of the functions, and a call to one or more functions. DDF and (GE)2 are more difficult to implement compared to ADFs implemented by Ryan. GE)2 uses more memory than DDF because it uses two separate chromosomes. While the use of ADFs degrades physician performance on a simple problem [7], Hemberg et al.

Benefits of GE

Generated programs are easy to understand: Because bloat does not occur in GE generated programs, they are easy to understand. The search can easily be biased to get a better success rate: By restricting the syntax of the grammar, GE focuses on certain areas of the search space. The challenges presented by typing and closure in GP are avoided: valid syntactic structures are specified in the grammar.

Thus, using a grammar overcomes the challenges of typing and closing in GP.

Chapter Summary

GP AND AUTOMATIC OOP

Introduction

GP and Programming Paradigm

GP for Automatic Procedural Programming

Automatic Object-Oriented Genetic Programming

Object-Oriented Genetic Programming for GP Scalability

Analysis and Justifications

Analysis of GP for Automatic OOP
Analysis of GE for Automatic OOP

Chapter Summary

METHODOLOGY

Introduction

Research Methodologies

Achieving the Objectives using the Proof by Demonstration Methodology

Performance Evaluation and Statistical Testing

Description of the Object-Oriented Programming Problems

The Abstract Data Types (ADTs)
Problems Solved Using the Evolved ADTs
The Object-Oriented Programming Problems Classification Based on Difficulty

Chapter Summary

GENETIC PROGRAMMING APPROACH FOR AUTOMATIC

Introduction

Programming Problem Specification

An Overview of the OOGP Algorithm

Program Representation

The Initial Population

The Internal Representation Language

Fitness Evaluation

Selection

Genetic Operators

The Crossover Operator
The Mutation Operator

Termination Criteria

The Greedy Object-Oriented Genetic Programming (GOOGP) Approach to

Chapter summary

GRAMMATICAL EVOLUTION APPROACH FOR AUTOMATIC

Introduction

Programming Problem Specification

The OOGE Algorithm

The Grammar
Program Representation
Initial Population Generation
Fitness Evaluation
Selection
Genetic Operators
Termination Criteria

Chapter Summary

FITNESS EVALUATION AND PARAMETERS FOR THE STACK ADT

Introduction

Programming Problem Specification for the Stack ADT

OOGP and GOOGP Primitives
OOGE Grammar for the Stack ADT

Parameters for the Stack ADT

Programming Problem Specification for Problem1

GOOGP Primitives
OOGE Grammar

Parameters for Problem1

Chapter Summary

FITNESS EVALUATION AND PARAMETERS FOR THE QUEUE

Introduction

Programming Problem Specification for the Queue ADT

OOGE Grammar

Parameters for the Queue ADT

OOGE Grammar

Chapter Summary

FITNESS EVALUATION AND PARAMETERS FOR THE LIST ADT

Introduction

Programming Problem Specification for the List ADT

OOGE Grammar

Parameters for the List ADT

OOGE Grammar

Chapter Summary

RESULT AND DISCUSSION

Introduction

The Stack Abstract Data Types (ADT) and Problem1

Comparison of OOGP, GOOGP and OOGE Performance for the Stack ADT

The Queue Abstract Data Types (ADT) and Problem2

Comparison of OOGP, GOOGP and OOGE Performances for the Queue ADT
Comparison of GOOGP and OOGE Performances for Problem2

The List Abstract Data Types (ADT) and Problem3

Comparison of GOOGP and OOGE Performances for Problem3

Conversion of the Solutions to a Programming Language

Performance Comparison with Other Studies

Chapter Summary

CONCLUSION AND FUTURE WORK

Introduction

This chapter provides the summary of the findings of this thesis and a conclusion to each of the objectives outlined in chapter 1.

Objectives and Conclusion

Objective 1: Evaluate Genetic Programming for Automatic Object-Oriented
Conclusion to Objective 1
Objective 2: Evaluate Grammatical Evolution for Automatic Object-Oriented
Objective 3: Compare the Performance of Genetic Programming and

Based on the research, Object-Oriented Grammatical Evolution (OOGE) was developed and evaluated to produce code for classes. Like OOGP and GOOGP, each object-oriented programming problem used to test the approach includes two classes: one containing the driver and the Abstract Data Type (ADT) class. The approach was thus tested to produce code for the stack, queue, and list ADTs, and code for programming problems that use the ADTs.

It was found that OOGE successfully produced code for the object-oriented programming problems tested. Complex problems may require ADFs for OOGE to generate code for an object-oriented programming problem. It was found that for most problems, OOGE achieved a higher success rate compared to the success rate of OOGP and GOOGP.

As with GE [1], this leads to the preservation of diversity in the population and slows down the convergence of the algorithm, thus allowing the algorithm to avoid local optima. For the ADT list, the success rate of GOOGP is slightly higher than that of OOGE when both approaches use ADF. OOGP achieved a success rate of 0% at all difficulty levels, while GOOGP achieved a success rate competitive with that of OOGE in most cases.

Since the main goal is to produce code that correctly implements classes for the object-oriented programming problem at hand, the success rate is the most important criterion compared to the average run time and average fitness.

Future Work

Chapter Summary

Banzhaf, W., Nordin, P., Keller, R.E., Francone, F.D.: Genetic Programming: An Introduction: On the Automatic Evolution of Computer Programs and Its Applications (The Morgan Kaufmann Series in Artificial Intelligence). Pillay, N.: An investigation into the use of genetic programming for the induction of novice procedural programming solution algorithms in intelligent programming tutors. Pillay, N.: A genetic programming system for the induction of iterative solution algorithms for novice procedural programming problems.

In: Proceedings of the 2005 annual research conference of the South African Institute of Computer Scientists and Information Technologists on IT research in developing countries. Silva, S., Costa, E.: Dynamic limits for bloat control in genetic programming and a review of past and current bloat theories.

Grammar for the Stack ADT

Grammar for Problem1

Grammar for the Queue ADT

Grammar for Problem 2

Grammar for Problem1

Grammar for Problem 3

APPENDIX C: GOOGP RUNNER THAT PRODUCED THE CODE FOR LISTEADT AND A SOLUTION FOR LISTEADT, converted to JAVA. The Java code given below was converted from a solution generated in run 3 of GOOGP using the seed in the first row of Table C.1. Most of the functions in the internal representation language have an equivalent instruction in a particular programming language and can be converted.

Table C.1 Run numbers and seeds for the GOOGP runs that found a solution for the list ADT

Program Requirements

How to Run the Program

Step 1 – Executable jar (.jar) file
Step 2 – Selecting a problem
Step 3 –Selecting and applying an approach to produce code for a selected

OOGP, GOOGP, and OOGE are used to produce code for 3 object-oriented programming problems, each involving 2 classes. GOOGP is selected to run if the checkbox labeled "Use Greedy OOGP" is checked.

Figure E.3 The GUI interface for Problem selection

Input Parameters and Editing

Text fields
Check Buttons
Combo Box

Maximum Descendant Depth: This specifies the maximum level of nodes that descendants created by genetic operators are allowed to have. Crossover: specifies the percentage of offspring to be created by applying the crossover operator. Mutation: This specifies the percentage of offspring to be created by applying the mutation operator.

Ext_Crossover: This specifies the probability that the external crossover applies to an individual in the population. Int_Crossover: This specifies the probability that the internal crossover applies to an individual in the population. It should be noted that the same solution may not be generated if the seed or some other parameter changes.

This is used to allow the user to make a decision whether to use a particular action or not. "Use Mutation Depth" enables or disables the "Enter Depth" field, while "Use Seed". The combo box labeled "Test seed" is useful for selecting one of the default seeds used to test the approaches.

For the final runs of each approach, the initial population generation growth method was used.

Indicators

The raw fitness calculation

The standardized fitness calculation

The adjusted fitness calculation

The normalized fitness calculation

The inverse fitness proportionate selection probability calculation