Archive for the ‘Web’ Category

Have you seen one of these floating around on the internet?

An example of a Captcha found on the internet.

An example of a Captcha found on the internet.

These are called CAPTCHAs. Wikipedia describes them as “a type of challenge-response test used in computing to ensure that the response is not generated by a computer”. In simple words, the web servers that serve the above image wish to ensure that it’s a human being at the other end (and not an automated script or bot of some sort). Hence, if the response is correct – the entity that created the response must be human.

Now, there are several ongoing efforts to try and digitize mankind’s physical texts into a publicly accessible digital archive. These efforts rely on OCR software to convert the scans of physical texts into digital content. However, OCR isn’t perfect. It often makes mistakes in recognizing words. But, it is kind enough to point you to the words where it feels it may have gone wrong.

Question is – Is it possible to harness the intelligence of the masses surfing the internet to aid the digitization process?
Answer is: Yes, with the assistance of reCAPTCHA.

reCAPTCHA Example

reCAPTCHA technique is simple – it serves two words for the CAPTCHA: One of the words served has been flagged as “not recognizable” by the OCR during conversation. The other one is known (i.e. it has been successfully recognized during OCR). Now, the human user attempts to decipher both words. If the user enters the recognized word correctly – he has most likely entered the unrecognizable word correctly as well. This can be confirmed by serving the unrecognizable word to several users – thereby improving the confidence. Ultimately – the actual contents of the unrecognizable word are registered. That’s one more correct word added to the digital archive.

Smart!

Read more about reCAPTCHA here.

Share and Enjoy:
  • Print
  • Digg
  • Sphinn
  • del.icio.us
  • Facebook
  • Mixx
  • Google Bookmarks
  • Blogplay
  • StumbleUpon
  • Twitter
Search
Categories
Links: