RLGO is a Go program based on reinforcement learning techniques. It combines TD learning and TD search, using a million binary features matching simple patterns of stones. RLGO outperformed traditional (pre-Monte-Carlo) programs in 9x9 Go.

Source MLJ PhD ICML-08 IJCAI-07

Joel Veness’ Meep is the first master-level chess program with an evaluation function that was learnt entirely from self-play, by bootstrapping from deep searches.


Sylvain Gelly’s MoGo (2007) is a Go program based on Monte-Carlo tree search. It was the world’s first master level 9x9 Computer Go program, and the first program to beat a human professional in even games on 9x9 boards and in handicap games on 19x19 boards.


In a previous life, I was CTO for Elixir Studios and lead programmer on the PC strategy game Republic: the Revolution.


Real-time planning in games with hidden state, using partially observable Monte-Carlo planning (POMCP).


Demo Source

Monte-Carlo search in Civilization II beats the built-in AI.


Demo Source

Real-time strategy games are often plagued by pathfinding problems when large numbers of units move around the map. Cooperative pathfinding allows multiple units to coordinate their routes effectively in both space and time.

Demo AIIDE-05 AIW-06

Deep reinforcement learning approaches superhuman performance in poker, without domain knowledge.


SmooCT wins three silver medals at the Computer Poker Competition.


Deep reinforcement learning solves a variety of continuous manipulation and locomotion problems, using a single neural network architecture.

arXiv-17 ICLR-16 NIPS-16 NIPS-15


A single neural network architecture learns to play many different Atari games to human level, directly from video input and joystick output.

Nature-15 NIPS-18 AAAI-17 ICLR-17 NIPS-16 AAAI-16 ICLR-16 ICML-DLW-15 NIPS-DLW-13

Demo Source

AlphaGo Zero becomes the world’s strongest Go player, starting completely from scratch, without any human knowledge.

Nature-17 (info)

AlphaGo defeats a human professional player for the first time, by combining deep neural networks and tree search.

Nature-16 (info) ICLR-15 (older)

First results on the new Starcraft II environment for reinforcement learning


Max Jaderberg’s For The Win agent learns by self-play, directly from raw pixels, to play Quake III Arena: Capture the Flag at human level.

arXiv-18 (info)

Greg Wayne’s Merlin combines memory and reinforcement learning to solve the DeepMind Lab, directly from raw pixels.


AlphaZero learns chess, shogi and Go by self-play, without human knowledge, to defeat existing world champion programs.

Science-18 (info) arXiv-17 (older)

DeepMind’s AlphaFold wins the biannual competition for protein folding.