31 Aug 2010

Digitizing Books One Word at a Time

reCAPTCHA is a free CAPTCHA service that helps to digitize books, newspapers and old time radio shows. Check out our paper in Science about it (or read more below).


A CAPTCHA is a program that can
tell whether its user is a human or a computer. You've probably seen
them — colorful images with distorted text at the bottom of Web
registration forms. CAPTCHAs are used by many websites to prevent abuse
from "bots," or automated programs usually written to generate spam. No
computer program can read distorted text as well as humans can, so bots
cannot navigate sites protected by CAPTCHAs.

About 200 million CAPTCHAs are solved by humans around the world every
day. In each case, roughly ten seconds of human time are being spent.
Individually, that's not a lot of time, but in aggregate these little
puzzles consume more than 150,000 hours of work each day. What if we
could make positive use of this human effort? reCAPTCHA does exactly
that by channeling the effort spent solving CAPTCHAs online into
"reading" books

more...