CAPTCHA-less Security

CAPTCHA is perceived as a quick and effective way to stop bots from performing abusive actions on a website. Bots are often deployed to do things like automatically enter spam into email forms or comment forms. They can also be used to submit fraudulent entries in other forms such as registration forms or to voting forms. CAPTCHA works by presenting a challenge to the user (typically in the form of an image containing jumbled-up letters) which must be solved to proceed in the interaction flow.

captcha example
Example of a CAPTCHA image with the words “Following” and “Finding” which must be entered into an associated field

On the surface, CAPTCHA seems perfect because bots only have access to that which is in the document source. Text within images cannot be seen by an internet bot and therefore the bot cannot submit a response to the challenge. This is also why CAPTCHA is an accessibility problem. Requiring vision to solve the CAPTCHA locks out all persons who are blind. Lest we think only persons who are blind are impacted, they can often also lock out low-vision users and those with dyslexia – particularly when there’s a lot of “noise” in the image.

Some have attempted to create alternate versions of the typical image CAPTCHA, such as the well-known reCAPTCHA which combines audio with the image. In nearly all cases, some problems with accessibility still remain. For instance, reCPATCHA is still inaccessible to the 45000-50000 Deaf-blind persons in the United States.

CAPTCHA is also not as effective as some may believe. Automated means of beating CAPTCHA have around since 2003. As CAPTCHA techniques advance, so do the means of beating them. There are even services which will employ humans to beat CAPTCHAs.

Keep this in mind at all times when considering CAPTCHA or any other security approaches on your site: The level of effort expended at abusing a system is directly proportional to the perceived benefit gained by the abuser. This applies to the recommendations I make below as well. CAPTCHA is, in many cases, very effective. Otherwise websites wouldn’t use it. But it does lock real people out of your site and it can be beaten. For those reasons, I’d like to discuss some approaches of thwarting website abuse without CAPTCHA.

CAPTCHA-less Security Approaches

Because all of the code for all of my sites (except this one, ironically) is home grown, I’ve developed my own code to handle security as well. This has its advantages and disadvantages, primarily because it took a long time of learning (some of which painful, to be honest) for me to get my code where it is today, but I’m proud to say that using the below approaches, I’ve wholly eliminated all spam and fraudulent registrations on my sites that use this code. Keep in mind, the more attractive a site is for abuse, the more that abusive users will try to find exploits. As I said earlier, in certain scenarios even humans can be employed to simply overcome whatever automated methods you have in place to fight abuse. Here’s what I’ve used with success.

Filter, Validate, Escape

Not directly related to CAPTCHA is the need to filter, validate, and escape all input.  This is something every developer should be doing at all times when developing systems which utilize forms. This is something that could take up several postings related to security. Instead, I’d like to point you to Chris Shiflett and encourage you to read his articles, blog, and the books he’s written on this topic. I’ll go over some of these topics here and encourage you to check Chris Shiflett’s work out for more details

Filter all input

Input filtering is the method by which you validate all incoming data and prevent any invalid data from being used by your application. It’s very similar in theory to how water filtering works, where impurities in water are not allowed to pass Chris Shifflett

In my approach, I filter all input from superglobals. Any key from a superglobal array that I do not expect is automatically removed. For instance, if I’m only expecting ‘id’ from $_GET then that is the only key that is kept. Furthermore, I strip out any input I consider out of bounds for the type of content expected. For instance, if I’m expecting a number for the value of ‘id’, then all non-numeric characters are stripped. If I’m expecting alphanumerics, than anything not a letter or a number is stripped, etc.

Validate everything

Validate strongly to ensure that input adheres to very specific constraints. In the PHP forms class I created for use on all my sites, I have 48 different types of validation ranging from simple string length validation to rather involved regular expressions. The type of validation in use in the final implementation depends on the type of expected input, but everything is validated in some way after input filtering. Even if a field isn’t required, it still gets filtered and validated against various rules meant to prevent abuse. For some of the validation rules, the user is permanently blocked from access as soon as a submission fails.

Escape Output

In this process, any input is escaped to prevent SQL injection, XSS, mail header injection, and so on. Upon accepting submission, most of these things are validated against rather strong rules in the first place. Any signs of abuse result in immediate banning of the offender. Still, all submissions are escaped and submissions stored in a database use prepared statements. On the way out, content is escaped as well. This extra step may be seen as redundant, but helps act as an added protection (in this case, protecting the user in case previous steps were inadequate).

Honeypot

A spam honeypot is a field intended to trap spammers by detecting submissions of attempted spam or fraudulent registrations.  One of the ways spammers try to exploit a site is the automatic submission of forms.  It is relatively trivial to create a bot which will crawl the web, looking for forms and filling them in.  Looking for the string “email”, “e-mail” or other variant as the value for the ‘name’ attribute on fields is also trivial and allows spammers a relatively easy way to exploit forms – unless, of course some other method exists to stop them, which is the entire purpose of CAPTCHA. Bots can vary pretty widely, so no method is perfect but honeypots do tend to work well against bots that have been designed to fill in all fields.

To implement a honeypot, create a hidden text field:

<label for="honeypot">Enter something here if you're a spammer</label><input type="text" id="honeypot" name="honeypot">

Then, use CSS to position the item offscreen. Using this method, you now have an accessible means of tripping up bots.

Automated Banning

As I mentioned in the section on validation, I aggressively ban – immediately – all abusive requests I detect.  But banning actually starts even before the time of request using the following approaches.

External Services for Checking Emails and IPs

I’ve created an automated CRON script which uses cURL to grab spammy IPs and e-mail addresses from various services.  One of my favorites is Stop Forum Spam.  I take those items and put them into my own database, because I don’t want to burden them with constant lookups.  Those items are then used at time of initial request and also during forum submission as part of the validation process.

UPDATE: Since I wrote the above, I’ve since created my own service, called BotSmasher, which aggregates the data from several services and provides an API you can query to check whether the email address submitted or the user’s IP address is found to have been previously discovered submitting spam or system abuse in the past.

Internal Systems for Checking Emails and IPs

Internally all email addresses and IPs that have been banned (from any means, such as those described above) are retained in a database table. All IPs are checked at the time of request and immediately rejected if the user’s IP has ever been banned. All submissions of all forms are logged as well. During this process, any time a submission is found to be abusive (see, Filter, Validate, Escape, above), the IP address associated with the request is immediately logged in this table. If the form in question had an email field then the email address is banned as well.

Registration requires confirmation

For any of my sites that require membership to certain areas, users must register with a working email address to which I send a confirmation email. Users must click that link – which takes them back to the site – in order to confirm their registration and be granted access to the site. This tactic is pretty common on the web and the reason this works is two-fold: first, it stops bots dead because they often enter nonsensical email addresses which go nowhere and second, even in cases where the fraudulent submissions are run by humans who use a good email address, they aren’t going to waste their time clicking confirmation links. One of my sites has been up for 3 years and not once has a spammer confirmed their registration.

Temporary Tokens

One tactic employed by bot developers is to copy the form itself and then use their script to send the fraudulent submissions. Doing so is a good way to get past validation. Once they know what the expected formats are for each field, they can repeatedly submit that information (doing this with cURL is super simple). To prevent this, I use a temporary token that is assigned to all users at the start of their session. The token expires when their session expires. The token is submitted with each form request. This essentially means that the user submitting the form must be on the site and the value in the token must match the value submitted or the submission will fail.

Other CAPTCHA-less techniques

The above are all things that I currently do on my sites. There are a couple of other techniques that I think show promise at thwarting abuse.

Confirmation Screens

A confirmation screen is, in a lot of ways, a challenge-response. Confirmation screens also help you comply with WCAG 3.3.4. Simply asking the user to confirm their form submission is a great way to beat bots. However, they only make sense in certain situations. Using a confirmation on a login screen would be silly.

SMS Verification

What I really love about doing my online banking with Bank of America is that they require SMS verification to perform certain actions. For instance, when I add a new payee to online bill payments, they send an SMS to my phone with a special code. This special code must be entered into their site to confirm the new payee. This feature is incredibly useful on systems for which security is absolutely critical.

Success, Current Challenges, and Weaknesses

Using the methods I’ve discussed above, I’ve not had one successful fraudulent form submission on one of my sites in its two-year existence. The caveat, however, is that none of my sites are huge traffic websites. My most popular website ever has about 300,000 pageviews a month. As I said at the beginning of this post, the level of effort expended at abusing a system is directly proportional to the perceived benefit gained by the abuser. The #1 way to beat everything I outlined above and beat CAPTCHA is to employ a human to do the abuse. Furthermore, each of the items above can be beaten by bots in some way. The reason why they have worked so well for me is because each of them together adds another layer of protection. Overall I think the most effective approaches have been the email confirmations, the use of Botsmasher, and the honeypot.

UPDATE 13-November 2013: In addition to the CAPTCHA-less steps above, something must be done to thwart repeated attempts by bad guys. One thing I’ve noticed is that despite the fact that they’re ultimately unsuccessful, bad guys using bots or other automated means of submitting forms will continue doing so as long as they think the form submission is successful. For example, fraudulent registrations on my site A11yBuzz went through the roof recently. Because the bad guys never confirmed the registration, there was no successful submissions of spam, but the nearly non-stop submissions of the registration form was effectively a Denial-of-Service attack. The answer to this so far has been to monitor the server logs to determine the IP address(es) responsible for these continuous attempts and to ban those IPs at the firewall level.  Eventually I’ll develop something that does this automatically, which I’m sure probably exists already in toolsets employed by server administrators at large organizations.

The above solutions I’ve discussed aren’t meant to be the ultimate solution to replace CAPTCHA. Instead, I hope I showed that you can and should attempt to apply other sensible security approaches before simply resorting to CAPTCHA. For the vast majority of cases, the methods listed above should suffice. As my final case-in-point, I’d simply like to point to Amazon, which has only one CAPTCHA on the entire site: the form you use to change your password. I would argue that if Amazon has figured out a way to be secure without CAPTCHA, so can you.

If you are interested in learning about the next generation in Web Accessibility Testing, sign up for the release of Tenon.io
If you or your organization need help with accessibility consulting, strategy, or accessible web development, email me directly at karl@karlgroves.com or call me at +1 443-875-7343. Download Resume [MS Word]

8 Comments

  • Posted April 3, 2012 at 1:18 pm   Permalink

    I want to say ‘Thank you’ for not being one of those accessibility people that just say, “You can’t use CAPTCHA!” Giving alternatives is more valuable than just saying, “Don’t do that!”

    • karlgroves
      Posted April 3, 2012 at 1:23 pm   Permalink

      Keith, thanks for taking the time to respond. I believe it is important for accessibility people to understand that we need to offer solutions to problems, not just more problems. When we see someone doing things that are inaccessible, I feel it is our responsibility to propose sound, effective solutions. Thanks for stopping by!

  • Posted April 3, 2012 at 2:32 pm   Permalink

    Another accessible alternative to captchas is to implement anti-spam questions, like a simple operation (“2+3”, for example) or a question containing the answer like “What is the colour of a famous king’s white horse?”.

    As far as honeypot is concerned, to me, I would not position it absolutely, but use display: none instead, so that screen readers cannot read the honeypot, which can perplex honest screen reader users when they fill the form.

  • karlgroves
    Posted April 4, 2012 at 6:36 am   Permalink

    Victor, thanks for the response. Using the display property to hide the honeypot would be a good approach, too.

    One related thing I forgot to mention is that I would caution against anyone using INPUT elements with the ‘type’ attribute of “hidden” though, as bots would ignore these.

    Using logic problems is a good alternative that should be weighed against the potential cost:benefit of the user. How many logic problems can you come up with? A dozen? A few dozen? The thing about CAPTCHA is there can be millions of potential combinations. With logic problems, you’re limited by time. After a certain (short) amount of time, you’ll run out of creative and easy logic problems. If there’s something of value to be gained by the abusive party, they can easily sit there and refresh the page to catalog all of the questions so they can program their bot to respond.

    Still, logic problems are probably very effective for small websites. Going back to what I said about level of effort vs. perceived benefit, most small websites won’t seem worth the effort.

  • darrenm
    Posted April 6, 2012 at 2:22 am   Permalink

    Hi Karl,

    What a fantastic post, I really appreciate you taking the time to explain some of the alternatives to implementing a captcha.

    The HoneyPot for example, I never knew about. Being responsible for overseeing website development in a Digital Agency this will certainly be something I’ll be using from now on.

    My past experiences with Re-Captcha (it may have improved since then), when pair testing it with a blind user on one of our web builds, was that the audio due to the background noise to prevent automated bots cheating it, made it completely unusable. Neither myself nor my pairs could ever successfully complete one submission, using the audio alternative alone.

    Thanks for sharing,

    Darren.

  • Posted April 10, 2012 at 11:57 pm   Permalink

    As Victor notes, the honeypot would be much better styled using CSS display:none; rather than off-screen. This way it would not be read by screen reader users and would be removed from keyboard navigation. I also recommend placing it after the Submit button (but still within the form element). Bots don’t know the difference.

    I wrote about several other possibilities a few years ago at http://webaim.org/blog/spam_free_accessible_forms/

    On our site, we use a short “naughty” word list, a simply honeypot, and time detection (if it takes less than 3 seconds or more than 40 minutes to submit the form, it’s rejected). This has cut our bot submissions from several thousand per month to < 10 per month.

  • Posted April 13, 2012 at 7:36 am   Permalink

    I’ve applied a few of these non-captcha techniques. Essentially enforcing a visitor has at least done a GET before they POST. On top of this, I reject submissions where the POST was too soon after the GET.

  • felgall
    Posted November 13, 2013 at 4:20 pm   Permalink

    This page lists about half a dozen different CAPTCHAs that you are using.

    The only difference between the CAPTCHAs you are using and the ones you mention at the start that you are not using is that the ones you are not using use images and the CAPTCHAs that you are using do not use images.

4 Trackbacks

  • By tips, tricks, aps | Pearltrees on April 4, 2012 at 7:35 am

    […] CAPTCHA-less Security | Karl Groves Input filtering is the method by which you validate all incoming data and prevent any invalid data from being used by your application. […]

  • […] a try? There’s a great article by Karl Groves with instructions on how to get you started! http://www.karlgroves.com/2012/04/03/captcha-less-security/ Big thanks to Deborah Edwards-Onoro @redcrew for sharing this […]

  • […] CAPTCHA: It’s the worst offender for many users, especially those who have visual disabilities. Not only words are distorted beyond recognition, but also audio is not clear either. For these reasons, blind users created a petition to kill CAPTCHA. Please support this cause by signing the petition and stopping using CAPTCHA in your online forms. CAPTCHA is not only inaccessible, but also annoying even to many users without visual disabilities, including myself. There are several solutions to this proposed by Karl Groves in his article, CAPTCHA-less Security. […]

  • […] are various other techniques that can be used to defeat spam bots as listed by this article by web accessibility consultant Karl Grove on CAPTCHA-less security that not only improve the accessibility of your site but the usability for everyone as well. The […]

Post a Comment