GP thesis I wish I had written

Panel Discussion on GP thesis I wish I had written

Wednesday, 16 July, 1997, 11:30 - 12:30 pm

Genetic Programming (GP97),

Stanford, California, USA

Walter Tackett, Neuromedia walter@neurostudios.com

Suggestions for GP Students

In 5 sentences of less what message do you want to give to students writing dissertations related to Genetic Programming?
Genetic programming has a great deal of "hack value," but very little theoretical foundation (this deficiency is inherited from GA). This shortcoming is not an impediment to being a useful practical tool: people bred animals for tens of thousands of years without benefit of theoretical foundations for genetics. However, I have seen no evidence of GP in use as a practical tool in everyday applications of science, commerce, and manufacturing to date. Things with theoretical foundations keep academics entertained and things with pragmatic engineering application make the wheels of industry turn: GP has neither. What do you propose to do about it?
Major potential topics
For one or all of the topics in 1., elaborate in terms of suggesting several specific investigations, a methodology, a problem class, related work, etc.
A) Characterize the search space on which genetic programs operate; This should have a purely theoretical basis, resulting directly and demonstratively in practical techniques which improve GP performance.
B) Define a class of problems for which GP is provably more suitable than parameter-based machine learning methods (eg, "traditional" NNs and GA, SA.). Show that, for example, GP offers some level of computing that is more powerful than encoding of FSA's as binary strings in GA/SA representations. Proofs only: examples don't count.
C) Demonstrate a real-world commercial application, continuously using live data, where GP offers a substantial and valuable improvement over all known methods including human analysis and/or human-created algorithms. No simulation allowed; algorithm must compete head-to-head live against the current method, be inarguably better, and be adopted as the standard method for solving that problem. Bonus: if you meet these criteria, you will be able to get investment capital, make money, etc.
What are good GP problems for a dissertation? By ``good'' let's mean useful for illustration and evidence that the concept you have for extending our knowledge about GP is worthwhile.
I suggest you try to create an algorithm for trading stock and/or commodities, with live trades and real money. Use the equity in your parent's house as capital. To wit: only pragmatic, money-making problems on live data, as per my suggestion (C) above, are of any use. I take issue with the definition of "good" stated above: illustration and evidence are for use by lawyers in a courtroom in order to create a consensus of opinion, and are not a suitable means of investigation for scientists and engineers. Evidence and opinion were appropriate at the time that Koza's books were written and have served their purpose well in order to popularize the field. They cannot carry the field any further beyond where it stands today.
What overused or unillustrative problems detract from convincing you about a GP research project?
Problems involving bugs, animal behavior, simulated robots, game theory/IPD, stock market prediction from historical data (particularly lame is the S&P 500), contrived military data (my personal favorite), boolean logic problems, problems that were solved less efficiently by so-and-so's NN/GA/Decision Tree, and the Irvine ML Database.
What is the most difficult aspect of GP of which to convince skeptical committee members? How did you convince them? What do committee members find intuitive about GP? Which of their intuitions mislead them?
The committee members I had all pretty much reflected my comments above, regarding the fact that GP is all example-based, and pretty simple examples at that. Showing a new lame example is ok for an MS thesis, at some schools at least, but not for the doctorate.
Are there any areas in GP research you would steer a dissertation-student away from? Why?
Any problem which meets the criteria of practical or theoretical soundness is a good one. In this sense, any area of investigation can be valuable or useless. I would caution Candidates to avoid the trap of thinking that the problem they are working on is theoretically useful because they are running GP against a theoretically significant problem. An example of this is game theory: IPD and similar models are useful from an analytic perspective, but yield nothing of use when machines learn how to play them.
Comment on what you consider a solid methodology for experimentation and investigations based on GP.
The scientific method works pretty well, and is very much underutilized in most of the GP "investigations" that come across my desk. Showing that you made GP work on some arbitrary problem is not scientific.
How do your GP projects fit into the broad investigative perspective of your work or research?
I do not use GP in my work today because (1) there are easier ways to solve the technical challenges of my comapnie's products, and (2) GP is a patented algorithm. I am reluctant to use a method which is the subject of someone else's patent in my business, unless there is an overwhelming argument that it is the superior method for achieving my goals.
What tools, advice and references did you find particularily useful in preparing your dissertation?
REFERENCES
Genetics
by John Maynard-Smith.
Probability, Random Variables, and Stochastic Processes
by Anathasios Pappoulis
Artificial Intelligence
by Nils Nilsson
Most of the GA/GP literature was not very helpful. Notable exceptions included a few representative GP papers by Koza, various papers that mostly showed up in Advances In GP I (K. Kinnear, Ed.), sections of Pete Angeline's dissertation, and the paper by Forrest and Mitchell in FOGA-II.
PEOPLE
The legendary (sic) GP-roundtable discussions at ICGA '93 were of tremendous use.
Talking with John Koza was a good perspective on GP of course, but perhaps more importantly on the state of GA and other ML research as it relates (or fails to relate) to GP.
Discussions with James Rice were particularly helpful, since he is the most unbiased skeptic I have encountered.
Do you have advice on computational resources, software platforms?
If what you are doing cannot run in reasonable time on a $3000 computer, then you must be able to economically justify the added expenditure of resources. This is a universal truth.
Other comments?
No, that about does it.