Skip to main content

I am Not a Robot | How and Why Captcha Works

Source: Auth0

Picture this, just when you thought that you had successfully entered in all of your personal and totally accurate information, you could see the verification email on the horizon, but then it all comes crashing down. You just got CAPTCHA'D. Select all of the images that include a boat it says, it starts out innocently enough, but then you gradually realize that half of the images you clicked were blurry sideways pictures of vans. But, you soldier on, you make it through, you click that sweet sweet verify button.... and there's another page. That's right, you have to do it all over again. This is all extremely dramatic, but there is no denying, it's annoying. Sometimes you get off easy, and there is one of those benevolent little checkboxes, sometimes there are the misshapen words. What is CAPTCHA, why do we have it, and how does it work. 

Where It Began 

When doing research for this ramble I had a lot of trouble finding out who, actually invented CAPTCHA. In most places you will find that Louis Von Ahn is credited with inventing captcha. However if you read the incredibly detailed Wikipedia article on CAPTCHA, you will find that the answer is not so clear as other publications make it seem. 

"Two teams have claimed to be the first to invent the CAPTCHAs used widely on the web today. The first team with Mark D. Lillibridge, Martín AbadiKrishna Bharat, and Andrei Broder, used CAPTCHAs in 1997 at AltaVista to prevent bots from adding Uniform Resource Locator (URLs) to their web search engine. Looking for a way to make their images resistant to optical character recognition (OCR) attack, the team looked at the manual of their Brother scanner, which had recommendations for improving OCR's results (similar typefaces, plain backgrounds, etc.). The team created puzzles by attempting to simulate what the manual claimed would cause bad OCR.[citation needed]"(Sourced From Wikipedia)

This excerpt copied directly from Wikipedia would make you think that Von Ahn was incorrectly credited with the invention of captcha. Until, that is, you finish reading that paragraph, and realize the glaring "citation needed" link. That's right! I really couldn't find these details anywhere else. However, there were other articles (much fewer though) that did mention this first team. The only thing that proves that this team had actually invented something is a 1997 patent. Which is extremely vague, and as near as I can tell, this team really did nothing with their idea. It seems to be more of a rough idea claim, and not an invention. 


The story of Von Ahn & Co. is much clearer. In 2000 Yahoo Mail was having a problem. Spam. Every day attackers would use programs to sign up for, and manage hundreds, thousands, and even millions of spam and malicious email accounts. To stop people from doing this Von Ahn came up with and wrote a solution to prove if someone was human, or a bot. The user, before registering an account would have to read warped or somehow obscured text (that was still readable) and then type what they read. This is when the term CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) was coined (a name which never came up in the original 1997 patent). And yes, this is definitely an acronym that was made to fit.

reCAPTCHA

Now owned by Google (not surprising), reCAPTCHA was started at Carnegie Mellon by a group that included, you guessed it, Louis Von Ahn. Think of reCAPTCHA as the proprietary, brand name, version. It was a streamlined applet that could be imbedded on a website with the wonky looking letters. Initially reCAPTCHA was designed to help digitise books. Each set used one known word, and a second unknown word that users would work to solve (essentially using millions of people to digitize two words). 

Now it turns out that robots get smarter. With errors and concerns starting to surface by 2014, Google used it's machine learning algorithms to test how solvable their current generation of puzzles was, and ended up being able to solve them 98% of the time. This spawned the noCAPTCHA reCAPTCHA initiative, in which google started rolling out a next generation which used the checkbox that is now so familiar.


The checkbox works by not looking to see if you can check a box, but rather your mouse movements and behavior on the site before the box is checked. It can also check your past browsing data. They can't always reach a conclusion by doing this, and will often display the image challenge (AKA click all of the squares with X type of thing).

There's Just One Big Problem 

The harder they make the CAPTCHAs to solve, the better the "bots" get at solving them. In the Verge article I read when researching for this ramble (linked below) they went into a long diatribe about what it means to be a human, and how bots are vastly superior to us at specialized things like image recognition. But, they do have a point, these tests have to be made accessible for people all over the world, and have to be standardized, it is the perfect thing to train AI to solve. 

Oh yeah, and apparently there are CAPTCHA farms, where loads of low paid workers sit and solve CAPTCHA's for fraudulent accounts all day. 

The Next Version 

Version 3 of reCAPTCHA has no button, or marking of it's presence. Instead it watches users actions on a site (plus their past browser cookies and behavior) and assigns them a risk score, which can be used to identify potential bots or suspicious behavior. This freaks me out a little bit, but it will most likely be pretty effective and is a good way of thinking outside of the box (quite literally) to solve a problem. 

Are You Human?

Tidbit:


Well I guess that wraps things up, doesn't it...

Sources: 

I really really try to give the whole story when writing these, and I think it's important to provide sources, especially since some outlets (definitely not Business Insider) had some inaccuracies, or at least some oversimplifications. 


Comments

  1. Tbh that’s pretty interesting, this is really well researched!

    ReplyDelete

Post a Comment

Popular posts from this blog

The Ubiquitous Mail Truck

    Trundling their way down the street, it is doubtless you have seen a postal vehicle resembling the one shown above. In fact, by the United States Postal Service's estimates, there are still around 100,000 still on the road. So what are these odd little trucklets?   They are called the "Grumman LLV," the LLV standing for "Long Life Vehicle."   Before the 1980s, the post office would buy modified versions of civilian vehicles (usually Jeeps), with right hand drive, so the driver could also dispense mail.  The problem with this, however, was the USPS wanted a more unified fleet of vehicles, that would be more suitable for deliveries and could carry more mail. The postal service needed a vehicle that was robust and could withstand rough roads while conversely remaining relatively small in stature.  In the mid-1980s they decided to launch a design contest for a dedicated delivery vehicle that best suited their needs. In 1985, the three final designs competed in T

Almost Heaven is Not West Virginia

John Denver's hit song isn't about West Virginia. There, I said it. It annoys me as a Virginian that people don't realize the landmarks being referenced in the song are primarily in Virginia, NOT West Virginia. Today we examine the story behind the hit song that John Denver didn't write.  That's right, the song was actually written by Bill Danoff and Taffy Nivert. The story behind the hit song is complicated and reading just one article won't give you the full picture, as always I have included the links used when synthesizing this article. I have tried to simplify the story by removing extraneous details and using the most consistent features from different accounts. Bill Danoff had grown up near Springfield, Massachusetts and was familiar with rural and "country roads." He had always had a fondness for road travel and taking in the scenery. When he attended Georgetown University in Washington D.C. he began to take drives through the Maryland countrys

The Centennial Bulb

Have you ever wondered about the longest burning light bulb? I have. Fire Station #6 in Livermore, California is home to the longest confirmed burning light bulb, they even have a Guinness World Record .  Since being installed in 1901 the bulb has burned for over 1 million hours, passing this milestone in 2015. It is worth noting the light hasn't been on continuously, as there was a small break in operation when a new fire station was built and the bulb had to be moved. That being said, the fire station has kept extensive records of the bulb and it's move; when the time came for the bulb to move to the new station it was given an official escort and was placed in a specially built protective box.  While it is heartwarming to see the devotion of the Livermore fire station and community in preserving a singular light bulb (which can be seen from a live webcam here ), this ramble isn't really about the bulb, it goes much deeper than that.  The Live Webcam A few years ago, bein