How did AlphaGo beat a legendary Go master?

It has been 19 years since Deep Blue beat the grand chess master Garry Kaparov. However, a computer program with the prowess to outperform world-class Go players posed as an insurmountable challenge before the invention of AlphaGo. The stunning 4-1 victory of AlphaGo over Lee Sedol, a prominent Go player, proved that machine is able to perform intellectual tasks even better than humans do. A milestone in artificial intelligence (AI), the machine learning technique in AlphaGo’s algorithm can be applied to various fields such as healthcare, automotive industry and finance for the betterment of society. This report aims to explain the algorithms behind AlphaGo followed by future directions.

An introduction to Go

Go is an ancient Chinese Board Game originated 2500 years ago. As illustrated in Figure 1, opponents place black and white stones alternately with theoal of surrounding a larger territory on the board. Despite its simple rules, Go embodies immense possibilities and complexity.

Figure 1: Lee Sedol vs AlphaGo Game (source: http://goban.co/boards/822)

The game is notoriously known as the “Holy Grail” of AI. The number of legal positions in Go is more than the number of atoms in the observable universe. Unlike chess, it is impossible to use brute-force calculations in Go due to huge search space for each move. The second challenge lies with the evaluation function which determines who is winning at a particular point. For chess, each piece is assigned a disparate value and evaluation can be done by a summation of points from both sides. However, every stone in Go has the same worth. Furthermore, it is extremely unclear that whose territory it is in the middle of the game.

AlphaGo Algorithm [1]

Figure 2: AlphaGo Algorithm

Firstly, two neural networks are created. The policy network is obtained by feeding the data of strong amateur games from online Go server using supervised learning. Just as how students make corrections with answers given by the teacher, supervised learning provides “answers” known as training signals for machine. The task for machine is to infer an accurate mapping function between input and output. In Go, the input is board position. Subsequent moves are the training signals. The aim is to deduce a function which gives the next move from the initial board configuration. However, this technique only mimics human playing and does not guarantee a win. Therefore, a reinforcement learning technique is adopted where the machine plays against itself to fine tune its function. This results in a policy network which gives the most probable move. Using the database of the machine’s self-play and the outcome of each game, a value network is trained to predict who is winning at a particular point.

Subsequently, the two neural networks are combined with Monte Carlo Tree Search (MCTS) with a rollout policy. MCTS generates simulations of the game using game trees. With reference to Figure 3, a game tree represents the state of the game using nodes. Moves in Go are represented as edges connecting different layers of nodes. Based on the rollout policy, every move is chosen randomly at the start. After each simulation, values such as likelihood of victory is stored at the node. These values then guide decision-making in later simulations, thus reducing the randomness and maximizing the probability of winning. MCTS provides a statistical approach in selecting the best move. In conjunction with a more intuitive approach by neural networks trained with humans’ game data, the combination leads to better performance of the program.

Figure 3: a small section of game tree in Go (Source: https://blogs.loc.gov/maps/category/game-theory/)

What distinguishes AlphaGo from other competing programs is the high level of AI involved. Programs such as Zen, Pachi [2] and FEUGO [3] are all sophisticated modifications based on MCTS. The component of deep learning, which is deemed essential for machine to acquire human intuition and creativity is missing. Therefore, they are unable to beat professional players without handicaps. Unsurprisingly, AlphaGo won 494 out of 495 games when testing against other Go programs

Future Directions

The success of AlphaGo paves the way for an era of Artificial General Intelligence, where machine’s ability is not restricted to a specific task but any intellectual task a human being is capable of. Through AlphaGo is just a Go program, the principle of allowing machine to learn itself rather than imposing hand-craft knowledge can be applied to various fields. For example, AlphaGo’s win inspires the development of smartphone assistants which provide tailored services to the user. High-profile AI is needed to decipher a user’s need from his or her habits. Moreover, DeepMind, the company which invented AlphaGo, has implemented a data-tracking app in collaboration with the National Health Service in the UK recently. It aims to extract useful data and better serve the patients. The challenges for future progress involves reducing human intervention and developing an automatic reasoning system which serves general purposes.

In conclusion, AlphaGo adopts two neural networks combined with Monte Carlo Tree Search to maximize the probability of winning. It outperforms Lee Sedol as well as other computer programs with the use of advanced deep learning techniques. It is a major breakthrough in the development of AI. The potential of AI to assist in our decision-making and transform the world is yet to be discovered.

References

[1] Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, Dieleman S. Mastering the game of Go with deep neural networks and tree search. Nature. 2016 Jan 28;529(7587):484-9.

[2] Baudiš P, Gailly JL. Pachi: State of the art open source Go program. InAdvances in computer games 2011 Nov 20 (pp. 24-38). Springer Berlin Heidelberg.

[3] Enzenberger M, Muller M, Arneson B, Segal R. Fuego—an open-source framework for board games and Go engine based on Monte Carlo tree search. IEEE Transactions on Computational Intelligence and AI in Games. 2010 Dec;2(4):259-70.

[4] Christopher B. TasteHit. 16 March. Google DeepMind’s AlphaGo: How it works. 2016. [Accessed 4 Dec 2016]. Available from: https://www.tastehit.com/blog/google-deepmind-alphago-how-it-works/

Comments