An AI bot appears to have destroyed Q*bert high scores by discovering and exploiting a bug. It’s the result of the unconventional method it took to learning the game.
Researchers at Germany’s University of Freiburg were using isometric platform game Q*bert to try out different ways for the system to learn and master the game. Normally a player progresses by jumping on cube platforms until they have all changed to a specific color.
However, the bot ‘discovered’ that a particular sequence of moves either froze or cancelled the ‘advance to next level’ trigger and instead left it able to continue jumping and racking up points apparently infinitely.
It appears the bug was only found because the bot was not learning through the more common artificial intelligence approach to games called “reinforcement learning.” That’s where it explores the various options at any stage of the game and builds up a picture of how likely each one is to lead to a favorable result. While this still involves millions of simulations, it uses an approach similar to humans where the goal is to find an approach with a high probability of success, rather than find the absolute perfect sequence of choices by excluding all others.
In this case the bot used the “evolution strategies” approach. In very simplified terms, this means starting with a random approach and exploring every possible sequence of options over multiple stages. As with a family tree over multiple generations, this gets increasingly more complicated as you cover a longer string of decisions.
The significant difference in this case is that it seems that to get to the point where the bug can be triggered, the player needs to have made some choices that appeared suboptimal at the time. A bot using reinforcement learning would have relatively quickly concluded that it was going down a path that was unlikely to lead to success. The evolution strategies approach wasn’t deterred by this and instead explored deep enough to discover that the bug overrode all the ‘rules’ it had previously figured out about the best strategy.