CAPTCHAs and other strategies for outsmarting bots

“For the last time, I am not a robot!”

You’re browsing your Facebook news feed, and—oh look—the Counting Crows are coming to town, and tickets are on sale now! You think to yourself, Facebook must really get me. You gather a few commitments from friends and head on over to the ticket vendor’s website. You pick your seats, gawk at the predatory pricing, but give in and proceed to pay.

But wait … what’s this? Before you are allowed to move on you are presented with a weird little checkbox and a message asking you to declare that you are “Not a Robot.”

You have seen this before and on occasion, this little widget has even asked you to tag images that “contain pictures of fire hydrants.” Weird. You check the box, the widget does a little dance, and you are allowed to purchase your piece of ’90s nostalgia.

Still, you can’t help but think: what if I were a robot? How would they be able to tell? Can’t robots click checkboxes, and what does a robot want with concert tickets anyway?

About web bots

These robots, or bots, are actually just little software programs that perform tasks on a website in place of a human actor. Software bots have many advantages over humans, and most are deployed for noble causes, Google’s site indexing web crawler being a prime example.

Bots will do repetitive tasks over and over again very quickly without complaint. They can visit hundreds or thousands of websites in a second.

Software bots have many advantages over humans, and most are deployed for noble causes.

Without any laws or safeguards in place, a motivated individual with sufficient means could deploy a bot to buy every ticket to every major concert whose tickets are available for purchase online. That is one of the downsides of a low-friction digital economy, and it’s why safeguards must be deployed to stop—or at least slowdown—the bad actors who would ruin it for the rest of us.

Enter the CAPTCHA

Websites have been asking visitors to prove their humanness since the early 2000s. Multiple parties claim credit for coming up with the idea, but a 1997 patent awarded to the Sanctum corporation of Santa Clara, CA, has resolved the issue, at least legally.

The term CAPTCHA, an acronym for “completely automated public Turing test to tell computers and humans apart,” wasn’t actually used until 2003 when it was included in a paper published by several computer science researchers. Intellectual ownership details aside, the CAPTCHA was invented to solve a very real problem of the rapidly evolving web.

Companies and individuals need to gather information from visitors using web forms. Web forms are how we book flights, troll subreddits, and order concert tickets. Because web forms are based on open standards and easily understood by software programs, it is very simple to develop bots that can submit massive amounts of information through a web form quickly. That is where the CAPTCHA, the ubiquitous “bouncer” of the web, steps in to keep the bots out.

How the first CAPTCHAs worked

The first iterations of the CAPTCHA typically consisted of an image containing some distorted text accompanied by a textbox. Before submitting information to a web form, a visitor was prompted to enter the text rendered in the image into the box. When the form is submitted, some software checks to make sure the text matches the image. There are many implementations of this model, with some being better than others. They are all based on the assumption that humans are much better at deciphering text from images than software bots.

For a long time, this was true, and this version of the CAPTCHA was able to keep the bots at bay pretty well. However, as computers became more powerful and computer science advanced, the bots slowly gained the upper hand.

Optical character recognition algorithms, or OCR, got better and better. Using OCR, it became increasingly easy for bot writers to solve a CAPTCHA.

CAPTCHA generators responded by changing distortion patterns or adding noise to the images in an attempt to stay one step ahead. In the end, however, the bots won out.

As the field of machine learning advanced, it became increasingly easy to solve even the most cryptic image-based CAPTCHA. In fact, solving an image-based CAPTCHA is often used to test how good an AI program is at object recognition.

The emergence of reCAPTCHAs

Not willing to stand by and let the bots take over, researchers responded with their own AI-based weapon in the form of the behavior-based reCATPCHA v2 (now owned by Google). This model does away with images and instead attempts to classify a site visitor as a bot or a human-based on a multitude of factors including how the visitor interacts with the site.

When the visitor is asked to affirm that they are not a robot, these environmental and behavioral factors are scored based on a statistical model. If the score passes a certain threshold, you get to buy your concert tickets. If not, you will be asked to classify a series of images to prove that you are a person.

By classifying images, not only are you proving your humanity, you are actually helping to improve Google’s object classification models.

About reCAPTCHA v3

Version 3 of the reCAPTCHA takes this concept even further by doing away with the affirmation entirely. Website developers who deploy this version can use these behavior-based scores to create their own affirmations or to deploy their own fraud prevention strategies.

A site may want to set a low threshold to allow a visitor to post to a discussion thread, but set a high threshold to submit a contact form. Both can be accomplished without friction, and the site can take the appropriate action based on their specific needs.

The behavior-based CAPTCHA using the power of AI and heaps of data has proven to be an effective weapon against bots. Of course, if it’s possible to determine if a visitor is a human using a statistical model, it’s also possible for a bot to pretend to be a human using the right software and a similar model.

As the field of software AI continues to advance, the arms race between bots and those that would stop them will continue, at least until the bots become so good that we willingly allow them to buy concert tickets for us.

Key takeways

Deploying a sophisticated spam management strategy is no longer optional. Without the right protection in place, bots will find ways to manipulate your web forms.

If you’ve noticed an uptick in junk submissions through your web forms, it might be time to consider an update that solves your spam problem while keeping it as easy as possible for your customers to submit a request or comment.