Genetic Programming in C++ FAQ

a.fraser@eee.salford.ac.uk

#Q1: How do you produce repeatable results with gpc++? Everytime I run it I get different results and I would like to reproduce the population dynamics of a particular run I have found.

A: The reason you get differing results is that the function which randomly select the functions and terminals of the genetic programming, gp_rand() , needs to be seeded. The seed for this can be found in the file rungps.cc with the command gp_srand(). The seed currently used is the returned value of the time() command. For repeatable results this seed should be defined by the user. Obviously having the same seed for each run is pointless so rather than having to recompile this seed each time you wish to make a new run you should instead set the seed as a command line parameter when running the code. This means altering the main() in gpmain.cc and adding the seed parameter to RunGPS() in rungps.cc. The actual code I will leave up to the reader to hack together.

#Q2: Why are there so many comments as it makes it almost impossible to decide which are comments and which are actual code ?

A: There are so many comments because:

1) Some people really do like them. This is one of the most commented on parts of the code (no pun intended).

2) I like to waffle and have little conversations with the user/genetic programmer. Gpc++ was coded in my own time (statements on how sad I am should be sent on postcards to the address somewhere below) which generally means very late evenings and early mornings. When I get bored I comment.

The only thing I can say is that all my code is going to have the long comments so either invest in an editor which changes colours when it prints comments and commands or delete all lines with a // in front (if you are a complete masochist).

PS: People have also commented about how comments go off the edge of the screen. If you are using Windows to code then you should be using the 800x600 mode or greater and use the full screen to edit. If you are using DOS I am sorry you are stuck with scrolling left and right. If you are in UNIX and using emacs or vi rather than some form of X text editor I have no sympathy and will just say that we are now in the 1990s and you really have no excuse.

#Q3: You have (at least) 3 methods of selecting parentsfor crossover which is best?

A: This an absolute nightmare problem to answer. My excuse is that it really does depend on your problem. One thing I would say is that probable fitness (roulette wheel) is considered somewhat old fashioned though may still be useful for some problems.

Demetic grouping is a very good, general method for controlling the amount of diversity in the population. If the probability of choosing a wanderer is low then diversity will remain high and vica versa. Notice though I have not defined what high and low actually mean as again this is defined within the context of your problem.

My personal advice is to try tournament selection first if that doesn't try the other methods. Most problems will yield, though, to the power of tournament selection.

#Q4: Why in the main() of gpmain.cc do you get the argv and argc arguments the wrong way round.

A: Because I am stupid and have a bad memory, next question. Change them around if you wish, it will be fixed next version.

#Q5: I have noticed that when I put less than the correct number of arguments on the command line the system crashes rather than giving me an error.

A: Another small error on the part of programmer it took Mr X 4 e-mail messages to explain what was wrong with my code I blame too many late nights myself. To fix this bug you need to change the

if ( ( argv > 1 ) && (argv <= 4 ) )

if (argv != 4)

While you are fixing this you could also change the problem outlined in Q4. This again will be fixed next version.

#Q6: When attempting to compile gpc++ the compilation fails with numerous errors one of which is 'Cannot include iostream.h file not found' what can I do?

A: Two things could be happening:

1) You do not have a C++ compiler but only a C version. In using GNU C/C++ compiler for instance, the executable produced will allow C++ compilation but will not work without the installation of the C++ libraries. If your compiler does not have C++ scrawled all over it you definitely don't have one as the producers of said compilers love to advertise the fact. One solution to this problem is to get GNUs compiler and libraries. This is available for DOS, called djgpp, and UNIX for free. Otherwise the only option is to beg, borrow or (even, god forbid) buy a compiler. We currently use Borland's for DOS & Windows but any which are ANSI C/C++ compatible are acceptable.

2) You need to switch the C++ Always flag on. I have only experience with Borland and GNU but with both compilers the system will only default to C++ if your files have a particular delimiter name (i.e not the *.cc that I use for compatibility across a number of systems). This can be altered through the options menu in Borland, though each compiler is different the only method is RTFM. In GNU the method is to always compile with g++ (rather than gcc the C compiler) even better is to use the project and make files found within the system when uncompressed.

#Q7: Can gpc++ be made to go even faster ?

A: Not fast enough for you, hmm. Note by the way that I talk about making it even faster that is a subtle lack of modesty on my part. Anyway to the answer. The way gpc++ evaluates a tree is that it translates every gene node and depending on the value returned will go to whichever gene is next. The best speed trick is to limit the number of sub-routines which the evaluation procedure has to recurse into. (NB: Is it still true that C/C++ compiled code still saves all register to the stack before entering a procedure ?) This can be done by taking all the function calls made by Translate...() and move all the code into that function instead. This makes the code almost incomprehensible but you cannot have everything.

You can also play around with the compiler options in Borland you can set the type of compilation so it will produce faster code and the same is true of GNU (probably). Please don't mail me to ask how as I know very little about optimisation techniques, but intead RTFM.

#Q8: Why the large number of files in your release? Some contain very little actual code.

A: This is to help in the production of code libraries and the need to keep executables relatively small. Once you produce a library you can link the code in to whatever files you like and the compiler should decide what is relevant. Rather than include just that particular function the compiler has to include all the code in the object file within which the function was held. Hence my increasing the number of files and hence the number of object files after compilation the library is more effecient.

#Q9: When uncompressing gpc++ with pkunzip a number of files with the same name are produced and have to be overwriten or ignored.

A: Use pkunzip -d this will uncompress some directories as well as the files.

#Q10: When running gpc++ it produces the output file but when on viewing the file the average fitness and length are complete garbage.

A: This is a PC only problem and I am really sorry about this one as it was one of those problems I did not forsee at all. It took three cups of coffee before I found the incorrect assumption. If you compile with the projects as supplied they assume that you are using a DX, i.e. one with a math co-processor. The compilation makes use of this fact to calculate the average fitness and length by making the division a floating point value and uses the co-processor to calculate it. Obviously when the code runs on an SX you get garbage. To use a SX you need to set the Compiler Options to emulate the floating point processor rather than expressly using thr 80387 chip.

#Q11: Surely you shouldn't name the variable name CFLAGS in the Makefile as you are using C++.

A: Right again, this should be changed to CCFLAGS or some other name. I am in complete ignorance in the creation of Makefiles and most of the more esoteric of UNIX commands. Again fixed next version.

#Q12: Why in UNIX compilation did you not set up the -O optimisation flags.

A: The reason is quite simply because I found it a bit flaky. While our workstation will happily run with optimisation my Linux version of GNU (the same version number) kept giving errors. If you compile with optimisations and it works fine but I had to produce something which would work every time.

#Q13: Why is gpc++ public domain.

A: I use Linux, GNU compilers, gnuplot, X11, clisp, pkzip and many other public domain packages. I expect people to pay me for my knowledge of coding and genetic programming and also send money in appreciation, to force payment on such a general package seems somehow wrong. Also I would like to see as many people using GP as possible. What I would really like is for GP to solve something that a person had never done before (Fermat's Last Theorem anyone?). It is also really nice when someone comments on how well your code worked or even send you a cheque!

#Q14: What would happen if I offered the author a job which researched genetic programming?

A: He would say yes, probably.

Written and 'html'ised by Adam Fraser. If you would like to show your appreciation of the code which this FAQ documents, I do accept cheques. Please send whatever you think the code is worth.

Gpc++ v0.1 was released in March 93.

Gpc++ v0.4 (this version) June 94.

This document: 13 September 1994