I am Not a Robot | How and Why Captcha Works

Source: Auth0

Picture this, just when you thought that you had successfully entered in all of your personal and totally accurate information, you could see the verification email on the horizon, but then it all comes crashing down. You just got CAPTCHA'D. Select all of the images that include a boat it says, it starts out innocently enough, but then you gradually realize that half of the images you clicked were blurry sideways pictures of vans. But, you soldier on, you make it through, you click that sweet sweet verify button.... and there's another page. That's right, you have to do it all over again. This is all extremely dramatic, but there is no denying, it's annoying. Sometimes you get off easy, and there is one of those benevolent little checkboxes, sometimes there are the misshapen words. What is CAPTCHA, why do we have it, and how does it work.

Where It Began

When doing research for this ramble I had a lot of trouble finding out who, actually invented CAPTCHA. In most places you will find that Louis Von Ahn is credited with inventing captcha. However if you read the incredibly detailed Wikipedia article on CAPTCHA, you will find that the answer is not so clear as other publications make it seem.

"Two teams have claimed to be the first to invent the CAPTCHAs used widely on the web today. The first team with Mark D. Lillibridge, Martín Abadi, Krishna Bharat, and Andrei Broder, used CAPTCHAs in 1997 at AltaVista to prevent bots from adding Uniform Resource Locator (URLs) to their web search engine. Looking for a way to make their images resistant to optical character recognition (OCR) attack, the team looked at the manual of their Brother scanner, which had recommendations for improving OCR's results (similar typefaces, plain backgrounds, etc.). The team created puzzles by attempting to simulate what the manual claimed would cause bad OCR.^{[citation needed]}"(Sourced From Wikipedia)

This excerpt copied directly from Wikipedia would make you think that Von Ahn was incorrectly credited with the invention of captcha. Until, that is, you finish reading that paragraph, and realize the glaring "citation needed" link. That's right! I really couldn't find these details anywhere else. However, there were other articles (much fewer though) that did mention this first team. The only thing that proves that this team had actually invented something is a 1997 patent. Which is extremely vague, and as near as I can tell, this team really did nothing with their idea. It seems to be more of a rough idea claim, and not an invention.

The story of Von Ahn & Co. is much clearer. In 2000 Yahoo Mail was having a problem. Spam. Every day attackers would use programs to sign up for, and manage hundreds, thousands, and even millions of spam and malicious email accounts. To stop people from doing this Von Ahn came up with and wrote a solution to prove if someone was human, or a bot. The user, before registering an account would have to read warped or somehow obscured text (that was still readable) and then type what they read. This is when the term CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) was coined (a name which never came up in the original 1997 patent). And yes, this is definitely an acronym that was made to fit.

reCAPTCHA

Now owned by Google (not surprising), reCAPTCHA was started at Carnegie Mellon by a group that included, you guessed it, Louis Von Ahn. Think of reCAPTCHA as the proprietary, brand name, version. It was a streamlined applet that could be imbedded on a website with the wonky looking letters. Initially reCAPTCHA was designed to help digitise books. Each set used one known word, and a second unknown word that users would work to solve (essentially using millions of people to digitize two words).

Now it turns out that robots get smarter. With errors and concerns starting to surface by 2014, Google used it's machine learning algorithms to test how solvable their current generation of puzzles was, and ended up being able to solve them 98% of the time. This spawned the noCAPTCHA reCAPTCHA initiative, in which google started rolling out a next generation which used the checkbox that is now so familiar.

The checkbox works by not looking to see if you can check a box, but rather your mouse movements and behavior on the site before the box is checked. It can also check your past browsing data. They can't always reach a conclusion by doing this, and will often display the image challenge (AKA click all of the squares with X type of thing).

There's Just One Big Problem

The harder they make the CAPTCHAs to solve, the better the "bots" get at solving them. In the Verge article I read when researching for this ramble (linked below) they went into a long diatribe about what it means to be a human, and how bots are vastly superior to us at specialized things like image recognition. But, they do have a point, these tests have to be made accessible for people all over the world, and have to be standardized, it is the perfect thing to train AI to solve.

Oh yeah, and apparently there are CAPTCHA farms, where loads of low paid workers sit and solve CAPTCHA's for fraudulent accounts all day.

The Next Version

Version 3 of reCAPTCHA has no button, or marking of it's presence. Instead it watches users actions on a site (plus their past browser cookies and behavior) and assigns them a risk score, which can be used to identify potential bots or suspicious behavior. This freaks me out a little bit, but it will most likely be pretty effective and is a good way of thinking outside of the box (quite literally) to solve a problem.

Are You Human?

Tidbit:

Florida man bit off part of friend's ear during Key West hotel brawl, authorities say

Well I guess that wraps things up, doesn't it...

Sources:

I really really try to give the whole story when writing these, and I think it's important to provide sources, especially since some outlets (definitely not Business Insider) had some inaccuracies, or at least some oversimplifications.

https://www.businessinsider.com/duolingo-ceo-invented-captcha-gave-to-yahoo-for-free-2018-6

https://www.theverge.com/2019/2/1/18205610/google-captcha-ai-robot-human-difficult-artificial-intelligence

https://patents.google.com/patent/US20050114705A1/enq=discriminating&q=human&inventor=raanan

https://phys.org/news/2012-06-captcha-story-squiggly-letters.html

https://www.wired.com/2014/12/google-one-click-recaptcha/

https://www.fastcompany.com/90369697/googles-new-recaptcha-has-a-dark-side

Der Märchenkönig | The mysterious Ludwig II of Bavaria

High in the foothills of the Bavarian Alps stands a castle, brash and alone. Its appearance is self indulgent, fanciful, and almost instantly recognizable. You, the reader, have most likely seen this castle before in posters, commercials, movies, and as the basis for the "Disney Castle." Despite the commanding appearance and importance suggested by its posture, the castle, Schloss Neuschwanstein, was never the seat of power for a great empire nor a powerful ruler. It stands as an unfinished monument to one of Europe's most distinctive rulers, King Ludwig II of Bavaria. Ludwig II is known by many as the "mad king," and popular belief holds that his penchant for building large and expensive castles led to his downfall. This may be true, but was the "mad" king really crazy? Many have marveled at the king's ostentatious castles, but the king's life, and especially his sudden death at age 40, remains shrouded in mystery. Born on the August 25th,...

Lorenzo Rambles

Search This Blog