Report on:

14th Belgium-Netherlands Conference on Artificial Intelligence BNAIC 2002, 21-22 October 2002, Leuven,

Appears in Newsletter BNVKI December, 2002 19(6) 145-146

Session: Evolutionary Computation II 16:10-17:10 22 Tuesday

Chair: William B. Langdon

The session was the last (excluding prize giving and speeches) of two happy days in Leuven. Three papers were presented:

  1. Karl Tuyls, Tom Lenaerts, Katja Verbeeck, Sam Maes and Bernard Manderick, Towards a Relation Between Learning Agents and Evolutionary Dynamics, p. 315-322.
  2. Pieter Spronck, Ida Sprinkhuizen-Kuyper and Eric Postma, Improving Opponent Intelligence by Machine Learning, p. 299-306.
  3. Robert E. Keller, Walter A. Kosters, Martijn van der Vaart and Martijn D. J. Witsenburg, Genetic Programming Produces Strategies for Agents in a Dynamic Environment. p. 171-178.
All three were original BNAIC papers

Evolutionary Computation II was the last technical session in the conference. It took place in parallel with two other session but never-the-less all three presentations were well attended. They took place in the attic Van Croy room. However despite the lateness in the day (following the tradition established by the previous day and another splendid lunch we ran 30 minutes later than the pre-conference programme intended) all the presenters kept their audience's attention. Unfortunately, perhaps because the audience was tired, or the speakers used most of their time themselves, there was only a little discussion after each lecture.

Towards a Relation Between Learning Agents and Evolutionary Dynamics

Karl Tuyls of Vrije Universiteit Brussel ( gave a laid back talk on learning agents, using variations on the standard Prisoners Dilemma game to show extension of standard game theory results is not straight forward. When the environment is fixed reinforcement learning (Q-learning), game theory and Markov analysis have been used to show the existence of stable fixed points in single agents systems. However where the environment may change and/or includes interaction with other agents (so the Markov property does not hold) this theory must be treated with caution. So Karl suggested that Q learning is cumbersome in Multi-Agent Systems (MAS).

The paper builds on earlier extensions (cross learning) of reinforcement learning to a simple form of population learning. Each member of the (supposedly infinite) population implements a pure strategy (ie a fixed simple strategy) but the population evolves under the effects of selection (ie preferential killing of poor agents).

Karl's results show with some 2x2 game payoff matrices the population does indeed move towards the fixed points predicted by game theory (the Nash equilibrium points). However with other payoffs (ie selection criteria) such as in the "battle of the sexes game", some points became unstable and the population moved away from them but towards others.

Section 5.3 shows the effect of learning on "replicator dynamics" can be very different from that predicted by Nash Equilibria. "It is a known fact that the asymptotic behaviour of the learning process can differ from that of the replicator dynamics" (page 321). Figure 3 shows illustrates this and that actual behaviour can also be very different.

In the experiments presented the population of agents evolves as a result of selection alone. Karl would also like to investigate the effects of the mutation and investigate similar models for Q-learning.

Improving Opponent Intelligence by Machine Learning

Pieter Spronck of Maastricht University ( used his laptop to good effect to illustrate his talk with a few snippets of popular computer games.

His interest is in making interactive video games less boring by ensuring computer generated opponents become less stupid and learn both before and during the the game. This is surely a big commercial area for AI to exploit.

He laid down several ground rules for the computer game. Computer generated opponents must not cheat. (But its ok for the human play to do so) By cheating he means the computer must not give itself advantages the human player does not have.

Their weapons must obey the same constraints and so forth. He felt the games industry was (now) on top of this. Pieter named five goals for AI: 1) no cheating, 2) preventing the opponents appearing to be stupid, 3) knowing how to exploit the environment, 4) learning from mistakes and 5) reacting realistically. In today's games, AI is very rarely used to address 2) and 3) and not at all for 4) and 5). Pieter felt evolutionary computing could help in these respects.

First, setting evolving computer opponent agents to play each other before the game is released (off-line learning) is a good way to test both their code and their strategies. It is interesting to note that the USAF has sponsored work by various groups, notably, Rob Smith, to use genetic algorithms and genetic programming to develop novel tactics for combat aircraft.

Secondly Pieter suggested that agents within computer games could continue to evolve and develop new abilities after the game has been sold and taken home. Such evolution could allow the computer to adapt to the owner of the game or to a group of players (in multi-player games).

Genetic Programming Produces Strategies for Agents in a Dynamic Environment

Walter A. Kosters of Leiden University ( presented Robert Keller's GP paper on evolving agents to play variants of chess with skill.

While Big Blue's success is well known Walter showed that learning multi-agent systems (MAS) can still make a contribution. It is important to note that each chess piece acts independently in a true distributed AI system. Ie in these versions of chess, there are no central controlling players. Each agent must play the best it can given only information about its immediate neighbours.

A potential source of confusion is Walter and Robert using "communication" to describe a form of genetic operation within the evolving population. Their operation allows potentially good genes to flow through the population, it does not mean the individual chess pieces can talk to each other about the board positions the other cannot see.

Walter showed the DAI agents can learn good strategies despite being given very little feedback (via the fitness function) and the very loose stochastic grouping of agents into teams. Surprisingly, even feedback via winning the game seemed to be excluded, although he suggests this needs further examination.

W. B. Langdon 30 October 2002 (last update 8 Sep 2008)