A Google-backed experiment showed a computer was able to learn how to play 49 Atari 2600 games and, in many cases, while outperforming a human player. It’s in some ways more impressive than even chess-playing supercomputers.
The likes of Deep Blue learn from their experiences playing, but were originally programmed with the rules of chess. The computer used by Google’s DeepMind project was not told anything about the games other than that the idea was to get the highest score. It had to figure everything else out from looking at the movement of the pixels and then seeing what effects its control inputs had on the score.
The researchers compared the computer’s performance against that of human players as well as checking what happened with completely random input controls. The self-learning computer performed at above human level in 29 of the 49 games, though that was based on the threshold of achieving 75 per cent of the highest score achieved by a “professional human games tester”. In three cases its performance was more than 1,000 percent better: Breakout, Boxing and Video Pinball.
However, it struggled with the likes of Ms Pac-Man and Asteroids, while it’s performance on Montezuma’s Revenge was no better than issuing random commands.
The researchers noted that the subject and genre of the game didn’t seem to make much difference to the computer’s performance. Instead the results seemed, perhaps logically enough, to be linked to how far in advance a player needed to plan the best strategy before executing it.
In a study published in Nature, the researchers said the experiment demonstrated that:
…a single architecture can successfully learn control policies in a range of different environments with only very minimal prior knowledge, receiving only the pixels and the game score as inputs, and using the same algorithm, network architecture and hyperparameters on each game, privy only to the inputs a human player would have.
They added that to further test whether a computer could recreate the learning abilities of the human brain, they’d need to run studies which had a wider range of information to deal with, of vastly varying importance, and see if it could figure out which inputs really mattered.
The long-term goal is to produce robotic devices that are better able to cope with unexpected events that aren’t part of their original programming.