Nature explains an artificial-intelligence breakthrough, with a computer that’s learned how to win at the complicated Asian game of go without studying the strategies and past games of human players:
Previous Go-playing computers developed by DeepMind, which is based in London, began by training on more than 100,000 human games played by experts. The latest program, known as AlphaGo Zero, instead starts from scratch using random moves, and learns by playing against itself. After 40 days of training and 30 million games, the AI was able to beat the world’s previous best ‘player’ — another DeepMind AI known as AlphaGo Master.
Like its predecessors, AlphaGo Zero uses a deep neural network — a type of AI inspired by the structure of the brain — to learn abstract concepts from the boards. Told only the rules of the game, it learns by trial and error, feeding back information on what worked to improve itself after each game.
At first, AlphaGo Zero’s learning mirrored that of human players. It started off trying greedily to capture stones, as beginners often do, but after three days it had mastered complex tactics used by human experts. “You see it rediscovering the thousands of years of human knowledge,” said [DeepMind CEO Demis] Hassabis. After 40 days, the program had found plays unknown to humans.
It still required a huge amount of computing power — four of the specialized chips called tensor processing units, which Hassabis estimated to be US$25 million of hardware. But its predecessors used ten times that number. It also trained itself in days, rather than months. The implication is that “algorithms matter much more than either computing or data available”, said [AlphaGo developer David] Silver.
The team hopes the same system can be applied to figuring out how to fit proteins together – an important part of creating new medicines.