If you thought Google couldn’t possibly take control of every aspect of the internet, you were wrong.
If you’ve ever tried to sign up to a website or post a comment, and been asked to type in a word or numbers that appear in a picture – or worse, try to figure out the difference between a virtually indistinguishable cat and dog – you’ve encountered a CAPTCHA.
Technically standing for Completely Automated Public Turing test to tell Computers and Humans Apart (though that was simply made up afterwards for the sake of the pun), it’s designed to make it impossible, or at least troublesome, for a computer to automatically figure out the correct answer to the barely readable text and thus prevent automated systems of leaving spam on websites.
Google has now bought out RECAPTCHA, the biggest firm producing such tools, having found a creative use for the firm’s technology. Many of the images used as CAPTCHAs are taken from scans of newspapers and books which are legible but have faded too much for tradition optical character recognition (thus making it difficult for spammers to create programs which can read and “solve” them.)
However RECAPTCHA has now wound up with a massive database of both the images and the words they represent. Google believes that database could be the key to improving its own scanning software so that it can do a better job of scanning text that is in poor condition.
As well as scanning pages for its controversial Google Books scheme, the company will also use the technology for scanning older newspapers for inclusion in the archive section of Google News.
Of course, once Google’s system perfects the art of reading such text, the firm will have to keep the scanning technology under tight control to avoid spammers using the solution to make CAPTCHAs worthless.
There may be unintended legal consequences of the purchase. There are arguments that using a CAPTCHA on a website without providing an audio version of the test can breach various national laws on disability discrimination. While RECAPTCHA avoided major legal problems, the technology being owned by a giant such as Google may increase the likelihood of a civil action.
(EDIT: Thanks to reader Luke Faraone for alerting us that, unlike some firms, RECAPTCHA “supplies an audio version of their captcha by default in their widget.”)