The experimental work described in this thesis was carried out in the School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, from February 2014 to April 2016, under the supervision of Professor Nelishia Pillay. Where the work of others has been used, this is duly acknowledged in the text.
PLAGIARISM
PUBLICATIONS
The third objective is to compare the performance of GP and GE for automatic object-oriented programming. Object-oriented genetic programming (OOGP), a variation of OOGP, namely Greedy OOGP (GOOGP), and GE approaches to automatic object-oriented programming were implemented. The results show that both GP and GE can be used for automatic object-oriented programming.
INTRODUCTION
- Purpose of the Study
- Objectives
- Contributions
- Dissertation Layout
Genetic programming will be developed and evaluated to produce code for the classes and programs that use the classes produced. This is the first study investigating the use of GE for object-oriented automatic programming. The performance of genetic programming and grammar evolution were compared for object-oriented automatic programming.
GENETIC PROGRAMMING
- Introduction
- Introduction to Genetic Programming Algorithm
- Control Model
- Generational Control Model
- Steady state control Model
- Program Representation
- Function and Terminal Set
- Function Set
- Terminal Set
- Sufficiency and Closure
- Initial Population Generation
- The Full Method
- The Grow Method
- Ramped Half-and-Half
- Evaluation
- Fitness Cases
- Fitness function
- Selection Methods
- Tournament Selection
- Fitness-Proportionate Selection
- Genetic Operators
- Reproduction
- Crossover
- Mutation
- Termination Criteria
- Advancements in Genetic Programming
- Strongly Typed GP
- The Use of Memory
- Iteration
- Modularization
- Bloat in GP
- Strengths and Weaknesses of GP
- Strengths
- Weakness
- Setting up a Genetic Programming System
- Chapter Summary
It was found that grammatical evolution scales better than genetic programming when developing code for classes.
GRAMMATICAL EVOLUTION
Introduction
Grammatical evolution uses the genotype-phenotype distinction and the mapping of genotype space to phenotype space to produce a program. The genotype must be expressed as a program; the mapping process is detailed in Section 3.6, while wrapping is discussed in Section 3.7. Each evolutionary algorithm has its advantages and disadvantages, Section 3.14 highlights the advantages and disadvantages of grammatical evolution.
Introduction to Grammatical Evolution
A Grammar in Grammatical Evolution
The number of production rules, say 𝑝𝑛, for each of the non-terminals is 3, respectively. By structuring the grammar so that some functions are not allowed to take certain terminals as arguments, the grammar can be used to obtain a better solution to the problem at hand. Production rules can be structured in a way that facilitates combinations of terminals in prefix or postfix notation.
Genotype Representation
A small change in genotype that results in a small change in phenotype is known as high locality. Conversely, if a small change in genotype results in a large change in phenotype, the algorithm is said to have low locality and, as such, is likely to perform a random search. An advantage of integer representation is that it eliminates the time spent converting codons from binary to integer.
Initial Population Generation
Mapping from the Genotype to the Phenotype
To illustrate the mapping process, the chromosome shown in Figure 3.4 is mapped with the BNF grammar defined in Section 3.4.1 as follows. The first codon value is 234 and the number of production rules for the start symbol is 3. Thus, variety in genotypic space is not associated with the same variety in phenotypic space.
Wrapping
As in natural evolution, GE exhibits a many-to-one mapping, ie. more than one genotype can be mapped to the same program. One of the disadvantages of this is that the population may contain many individuals with the worst possible fitness as a result of incomplete mapping.
Evaluation
Selection
Genetic Operators
- Crossover
- Mutation
The result is 0, which is the number of the production rule stored at the first index of H1. The homologous crossover uses more memory and increases run time because it requires the history of the selected rules to be saved. In general, a crossover operator aims to search near the current solution in hopes of a better solution.
The operator seeks to increase the diversity of the population by taking the search to a new area of the search space6. Nodal mutation changes a single node in an inference tree while structural mutation changes the structure of the inference tree. While nodal mutation searches more of the neighborhood of the candidate solution, structural mutation searches new areas of the search space.
This could be achieved by specifying the percentage of the mutation operator that should result in nodal or structural mutation. Castle and Johnson [45] investigated the effect of the mutation operator in terms of the mutation point. While the crossover operator aims to exploit the neighbor solution to the candidate solution, the mutation operator aims to explore the search space.
Striking a balance between the probabilities of both the mutation and crossover operators will balance the exploration and exploitation capabilities of the search algorithm.
Termination Criteria
Bloat in GE
Modularization in GE
For a symbolic regression problem, an example production rule for the starting symbol is < 𝑒𝑥𝑝𝑟 >. The non-terminal < 𝑐𝑜𝑑𝑒 > is expanded to get the main program, while the non-terminal < 𝑎𝑑𝑓𝑐𝑜𝑑𝑒 > is expanded to get the ADF. While the number of ADFs a program must contain is preset in Ryan [47], Harper and Blair [46] use production rules to determine the number of DDFs a program must contain.
If the latter is the case, the expansion ends otherwise the result is further expanded to determine the number of DDFs to include and the number of parameters each of the DDFs should take. While Harper and Blair [46] use one chromosome during the mapping process to obtain a program that has a main program and one or more DDFs, Hemberg et al. The first is used to expand the terminals that determine the number of ADFs a program must have and the number of parameters each of the ADFs must take.
The second is used to generate the following: the main program, the body of the functions, and a call to one or more functions. DDF and (GE)2 are more difficult to implement compared to ADFs implemented by Ryan. GE)2 uses more memory than DDF because it uses two separate chromosomes. While the use of ADFs degrades physician performance on a simple problem [7], Hemberg et al.
Benefits of GE
Generated programs are easy to understand: Because bloat does not occur in GE generated programs, they are easy to understand. The search can easily be biased to get a better success rate: By restricting the syntax of the grammar, GE focuses on certain areas of the search space. The challenges presented by typing and closure in GP are avoided: valid syntactic structures are specified in the grammar.
Thus, using a grammar overcomes the challenges of typing and closing in GP.
Chapter Summary
GP AND AUTOMATIC OOP
Introduction
GP and Programming Paradigm
GP for Automatic Procedural Programming
Automatic Object-Oriented Genetic Programming
Object-Oriented Genetic Programming for GP Scalability
Analysis and Justifications
- Analysis of GP for Automatic OOP
- Analysis of GE for Automatic OOP
Chapter Summary
METHODOLOGY
Introduction
Research Methodologies
Achieving the Objectives using the Proof by Demonstration Methodology
Performance Evaluation and Statistical Testing
Description of the Object-Oriented Programming Problems
- The Abstract Data Types (ADTs)
- Problems Solved Using the Evolved ADTs
- The Object-Oriented Programming Problems Classification Based on Difficulty
Chapter Summary
GENETIC PROGRAMMING APPROACH FOR AUTOMATIC
Introduction
Programming Problem Specification
An Overview of the OOGP Algorithm
Program Representation
The Initial Population
- The Internal Representation Language
Fitness Evaluation
Selection
Genetic Operators
- The Crossover Operator
- The Mutation Operator
Termination Criteria
The Greedy Object-Oriented Genetic Programming (GOOGP) Approach to
Chapter summary
GRAMMATICAL EVOLUTION APPROACH FOR AUTOMATIC
Introduction
Programming Problem Specification
The OOGE Algorithm
- The Grammar
- Program Representation
- Initial Population Generation
- Fitness Evaluation
- Selection
- Genetic Operators
- Termination Criteria
Chapter Summary
FITNESS EVALUATION AND PARAMETERS FOR THE STACK ADT
Introduction
Programming Problem Specification for the Stack ADT
- Fitness Evaluation
- OOGP and GOOGP Primitives
- OOGE Grammar for the Stack ADT
Parameters for the Stack ADT
Programming Problem Specification for Problem1
- Fitness Evaluation
- GOOGP Primitives
- OOGE Grammar
Parameters for Problem1
Chapter Summary
FITNESS EVALUATION AND PARAMETERS FOR THE QUEUE
Introduction
Programming Problem Specification for the Queue ADT
- Fitness Evaluation
- OOGP and GOOGP Primitives
- OOGE Grammar
Parameters for the Queue ADT
Programming Problem Specification for Problem2
- Fitness Evaluation
- OOGP and GOOGP Primitives
- OOGE Grammar
Parameters for Problem2
Chapter Summary
FITNESS EVALUATION AND PARAMETERS FOR THE LIST ADT
Introduction
Programming Problem Specification for the List ADT
- Fitness Evaluation
- OOGP and GOOGP Primitives
- OOGE Grammar
Parameters for the List ADT
Programming Problem Specification for Problem3
- Fitness Evaluation
- OOGP and GOOGP Primitives
- OOGE Grammar
Parameters for Problem3
Chapter Summary
RESULT AND DISCUSSION
Introduction
The Stack Abstract Data Types (ADT) and Problem1
- Comparison of OOGP, GOOGP and OOGE Performance for the Stack ADT
The Queue Abstract Data Types (ADT) and Problem2
- Comparison of OOGP, GOOGP and OOGE Performances for the Queue ADT
- Comparison of GOOGP and OOGE Performances for Problem2
The List Abstract Data Types (ADT) and Problem3
- Comparison of GOOGP and OOGE Performances for Problem3
Conversion of the Solutions to a Programming Language
Performance Comparison with Other Studies
Chapter Summary
CONCLUSION AND FUTURE WORK
Introduction
This chapter provides the summary of the findings of this thesis and a conclusion to each of the objectives outlined in chapter 1.
Objectives and Conclusion
- Objective 1: Evaluate Genetic Programming for Automatic Object-Oriented
- Conclusion to Objective 1
- Objective 2: Evaluate Grammatical Evolution for Automatic Object-Oriented
- Conclusion to Objective 2
- Objective 3: Compare the Performance of Genetic Programming and
- Conclusion to Objective 3
Based on the research, Object-Oriented Grammatical Evolution (OOGE) was developed and evaluated to produce code for classes. Like OOGP and GOOGP, each object-oriented programming problem used to test the approach includes two classes: one containing the driver and the Abstract Data Type (ADT) class. The approach was thus tested to produce code for the stack, queue, and list ADTs, and code for programming problems that use the ADTs.
It was found that OOGE successfully produced code for the object-oriented programming problems tested. Complex problems may require ADFs for OOGE to generate code for an object-oriented programming problem. It was found that for most problems, OOGE achieved a higher success rate compared to the success rate of OOGP and GOOGP.
As with GE [1], this leads to the preservation of diversity in the population and slows down the convergence of the algorithm, thus allowing the algorithm to avoid local optima. For the ADT list, the success rate of GOOGP is slightly higher than that of OOGE when both approaches use ADF. OOGP achieved a success rate of 0% at all difficulty levels, while GOOGP achieved a success rate competitive with that of OOGE in most cases.
Since the main goal is to produce code that correctly implements classes for the object-oriented programming problem at hand, the success rate is the most important criterion compared to the average run time and average fitness.
Future Work
Chapter Summary
Banzhaf, W., Nordin, P., Keller, R.E., Francone, F.D.: Genetic Programming: An Introduction: On the Automatic Evolution of Computer Programs and Its Applications (The Morgan Kaufmann Series in Artificial Intelligence). Pillay, N.: An investigation into the use of genetic programming for the induction of novice procedural programming solution algorithms in intelligent programming tutors. Pillay, N.: A genetic programming system for the induction of iterative solution algorithms for novice procedural programming problems.
In: Proceedings of the 2005 annual research conference of the South African Institute of Computer Scientists and Information Technologists on IT research in developing countries. Silva, S., Costa, E.: Dynamic limits for bloat control in genetic programming and a review of past and current bloat theories.
Grammar for the Stack ADT
Grammar for Problem1
Grammar for the Queue ADT
Grammar for Problem 2
Grammar for Problem 3
Grammar for Problem1
Grammar for Problem 2
Grammar for Problem 3
APPENDIX C: GOOGP RUNNER THAT PRODUCED THE CODE FOR LISTEADT AND A SOLUTION FOR LISTEADT, converted to JAVA. The Java code given below was converted from a solution generated in run 3 of GOOGP using the seed in the first row of Table C.1. Most of the functions in the internal representation language have an equivalent instruction in a particular programming language and can be converted.
Program Requirements
How to Run the Program
- Step 1 – Executable jar (.jar) file
- Step 2 – Selecting a problem
- Step 3 –Selecting and applying an approach to produce code for a selected
OOGP, GOOGP, and OOGE are used to produce code for 3 object-oriented programming problems, each involving 2 classes. GOOGP is selected to run if the checkbox labeled "Use Greedy OOGP" is checked.
Input Parameters and Editing
- Text fields
- Check Buttons
- Combo Box
Maximum Descendant Depth: This specifies the maximum level of nodes that descendants created by genetic operators are allowed to have. Crossover: specifies the percentage of offspring to be created by applying the crossover operator. Mutation: This specifies the percentage of offspring to be created by applying the mutation operator.
Ext_Crossover: This specifies the probability that the external crossover applies to an individual in the population. Int_Crossover: This specifies the probability that the internal crossover applies to an individual in the population. It should be noted that the same solution may not be generated if the seed or some other parameter changes.
This is used to allow the user to make a decision whether to use a particular action or not. "Use Mutation Depth" enables or disables the "Enter Depth" field, while "Use Seed". The combo box labeled "Test seed" is useful for selecting one of the default seeds used to test the approaches.
For the final runs of each approach, the initial population generation growth method was used.
Indicators
The raw fitness calculation
The standardized fitness calculation
The adjusted fitness calculation
The normalized fitness calculation
The inverse fitness proportionate selection probability calculation