Most if not all internet users are familiar with CAPTCHAs. They are possibly one of the most infuriating things ever. Everytime you go to share something from Stumble on your Facebook wall, everytime you think you’re almost done signing up for something, they spring up, throwing one more task for the ever in a hurry short attention span user (which is an ever growing demographic). However, though they can be quite annoying (and at times down right illegible) they do serve a purpose. They help fend off the only thing about the internet that is more annoying than CAPTCHAs themselves – SPAM. Lets face it, if you’re running an online service or subscribing to one, SPAM can be a real hurdle. And the simple step of identification by verification of text can keep a lot of SPAM at bay. This blog itself receives dozens of SPAM comments per week (that number is low because for now I have a fairly limited web presence). This makes it tough to balance between convenience and security. On the one hand I would prefer not to have to go through these annoying SPAM comments, but once in a while there is actually a valid comment from an unkown poster that gets dragged into the net. On the other hand, I personally understand that many people read/comment on the fly and are deterred from doing so if it entails a further layer of verification.
There is another side to these CAPTCHAs though, one which some of you out there may already familiar with, but which I only recently stumbled upon. reCAPTCHA, a free online service, uses CAPTCHAs to digitize old books, newspapers and other media. The book pages are scanned, and then transformed into text using “Optical Character Recognition” (OCR). Only problem is, its not a perfect. So they ask the help of the individual users to help them out. Sounds great right? I mean why not contribute to the endeavour to preserve humanity’s literary culture and history while you keep away unwanted online messages? But is it really as simple as all that? Google provides this service free of charge for any online service. Think of the number of different sites that are using it. According to the reCAPTCHA site about 200 million CAPTCHAs are solved by humans worldwide everyday! This means a lot of free labour for Google’s book reader project. I mean would you stand for this in the real world – if someone asked you to come by and digitize books for them, free of charge, even if it was just say a few dozen words per day? Okay so maybe its not actually feasible to reward the individual for their efforts, but it still feels a bit unfair for you to help digitize a book for Google, and then maybe pay to read it later.
Google’s reCAPTCHA uses a two word CAPTCHA verification – one of which is usually easily legible while the otehr is distorted. The first of these is a control word, they have already digitized it, so even if you mess it up it won’t affect the verification. The second word, the distorted one, is the one that they need your help with. No software can read a distorted or angled word from the image produced by scanning the page. Once enough people have verified the word and they have a constant answer they assign it to that word (I’m not sure how many it actually takes). Theoretically then, if enough people wrote the wrong word for that particular image, it would be recorded in Google’s data banks as the incorrect word – which is exactly what a blog I recently read suggested, prompting people to type in a particular derogatory word, used to refer to people of African descent, in an effort to have that word then show up in digitized copies of books worldwide. The standard two word CAPTCHA isn’t the only one you encounter ofcourse. There are other options – ones that contain an image of a number and an image of a distorted word; ones that pose a question and have a drop down list of answers; ones that are quotes or pop culture references. These don’t work for Google’s digitizing project (I believe) but serve the purpose of verification to avoid Spamming activities.
To some extent though I suppose this a moot point. This tech has been running for ages. And fot the most part, despite how it may annoy us, we’ve made our peace with it. CAPTCHA exists, and if the site you’re using employs it, there’s really no way around it. The question is whether we’re going to see better methods of online verificationin the future (I’m betting Google hopes not).
For more information on the reCAPTCHA service and how it use it for yourself go here: Google’s reCAPTCHA