Source: Auth0 |
Where It Began
When doing research for this ramble I had a lot of trouble finding out who, actually invented CAPTCHA. In most places you will find that Louis Von Ahn is credited with inventing captcha. However if you read the incredibly detailed Wikipedia article on CAPTCHA, you will find that the answer is not so clear as other publications make it seem.
"Two teams have claimed to be the first to invent the CAPTCHAs used widely on the web today. The first team with Mark D. Lillibridge, MartÃn Abadi, Krishna Bharat, and Andrei Broder, used CAPTCHAs in 1997 at AltaVista to prevent bots from adding Uniform Resource Locator (URLs) to their web search engine. Looking for a way to make their images resistant to optical character recognition (OCR) attack, the team looked at the manual of their Brother scanner, which had recommendations for improving OCR's results (similar typefaces, plain backgrounds, etc.). The team created puzzles by attempting to simulate what the manual claimed would cause bad OCR.[citation needed]"(Sourced From Wikipedia)
This excerpt copied directly from Wikipedia would make you think that Von Ahn was incorrectly credited with the invention of captcha. Until, that is, you finish reading that paragraph, and realize the glaring "citation needed" link. That's right! I really couldn't find these details anywhere else. However, there were other articles (much fewer though) that did mention this first team. The only thing that proves that this team had actually invented something is a 1997 patent. Which is extremely vague, and as near as I can tell, this team really did nothing with their idea. It seems to be more of a rough idea claim, and not an invention.
The story of Von Ahn & Co. is much clearer. In 2000 Yahoo Mail was having a problem. Spam. Every day attackers would use programs to sign up for, and manage hundreds, thousands, and even millions of spam and malicious email accounts. To stop people from doing this Von Ahn came up with and wrote a solution to prove if someone was human, or a bot. The user, before registering an account would have to read warped or somehow obscured text (that was still readable) and then type what they read. This is when the term CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) was coined (a name which never came up in the original 1997 patent). And yes, this is definitely an acronym that was made to fit.
reCAPTCHA
Now owned by Google (not surprising), reCAPTCHA was started at Carnegie Mellon by a group that included, you guessed it, Louis Von Ahn. Think of reCAPTCHA as the proprietary, brand name, version. It was a streamlined applet that could be imbedded on a website with the wonky looking letters. Initially reCAPTCHA was designed to help digitise books. Each set used one known word, and a second unknown word that users would work to solve (essentially using millions of people to digitize two words).
Now it turns out that robots get smarter. With errors and concerns starting to surface by 2014, Google used it's machine learning algorithms to test how solvable their current generation of puzzles was, and ended up being able to solve them 98% of the time. This spawned the noCAPTCHA reCAPTCHA initiative, in which google started rolling out a next generation which used the checkbox that is now so familiar.
There's Just One Big Problem
The harder they make the CAPTCHAs to solve, the better the "bots" get at solving them. In the Verge article I read when researching for this ramble (linked below) they went into a long diatribe about what it means to be a human, and how bots are vastly superior to us at specialized things like image recognition. But, they do have a point, these tests have to be made accessible for people all over the world, and have to be standardized, it is the perfect thing to train AI to solve.
Oh yeah, and apparently there are CAPTCHA farms, where loads of low paid workers sit and solve CAPTCHA's for fraudulent accounts all day.
The Next Version
Version 3 of reCAPTCHA has no button, or marking of it's presence. Instead it watches users actions on a site (plus their past browser cookies and behavior) and assigns them a risk score, which can be used to identify potential bots or suspicious behavior. This freaks me out a little bit, but it will most likely be pretty effective and is a good way of thinking outside of the box (quite literally) to solve a problem.
Are You Human?
Tidbit:
Well I guess that wraps things up, doesn't it...
Sources:
I really really try to give the whole story when writing these, and I think it's important to provide sources, especially since some outlets (definitely not Business Insider) had some inaccuracies, or at least some oversimplifications.
Tbh that’s pretty interesting, this is really well researched!
ReplyDelete