Applications                     

 

RLGO is a Go program based on reinforcement learning techniques. It combines TD learning and TD search, using a million binary features matching simple patterns of stones. RLGO outperformed traditional (pre-Monte-Carlo) programs in 9x9 Go.

Source MLJ PhD ICML-08 IJCAI-07

Joel Veness’ Meep is the first master-level chess program with an evaluation function that was learnt entirely from self-play, by bootstrapping from deep searches.

NIPS-09

Sylvain Gelly’s MoGo (2007) is a Go program based on Monte-Carlo tree search. It was the world’s first master level 9x9 Computer Go program, and the first program to beat a human professional in even games on 9x9 boards and in handicap games on 19x19 boards.

Source CACM AIJ PhD AAAI-08 ICML-07

In a previous life, I was CTO for Elixir Studios and lead programmer on the PC strategy game Republic: the Revolution.

Trailer

Real-time planning in games with hidden state, using partially observable Monte-Carlo planning (POMCP).

Demo Source NIPS-10  

Monte-Carlo search in Civilization II beats the built-in AI.

Demo Source JAIR IJCAI-11 ACL-11

Real-time strategy games are often plagued by pathfinding problems when large numbers of units move around the map. Cooperative pathfinding allows multiple units to coordinate their routes effectively in both space and time.

Demo AIIDE-05 AIW-06

Policy gradient methods learn a neural network controller for an octopus arm with 66 continuous state dimensions and 26 continuous action dimensions.

Demo ICML-14 EWRL-12

Deep reinforcement learning approaches superhuman performance in poker, without domain knowledge.

arXiv-16

SmooCT wins three silver medals at the Computer Poker Competition.

IJCAI-15 ICML-15

Deep reinforcement learning solves a variety of continuous manipulation and locomotion problems, using a single neural network architecture.

ICLR-16 arXiv-16 NIPS-15 Demo

A single neural network architecture learns to play many different Atari games to human level, directly from video input and joystick output.

Source Nature arXiv-16 AAAI-16 ICLR-16 ICML-DLW-15 NIPS-DLW-13

AlphaGo defeats a human professional player for the first time, by combining deep neural networks and tree search.

Nature (published) (submitted) (info)                                   ICLR-15 (older)