Karl Groves

Tech Accessibility Consultant
  • Web
  • Mobile
  • Software
  • Hardware
  • Policy
+1 443.875.7343

Feature misuse !== feature uselessness

Ugh. Longdesc. For those who don’t follow such things, the fight over the longdesc attribute in HTML5 goes back to (at least) 2008. Back then, the WHATWG was also considering eliminating the alt attribute, the summary attribute, and table headers. Ian Hickson’s blatant and laughable egotism led him to believe he knew more about accessibility than the many actual accessibility experts he was arguing with. In this context, it is no wonder that a lot of people have gotten to the point of just being sick of the topic of longdesc, instead preferring to concentrate on more impactful concerns in accessibility.

While I agree with a lot of the arguments made in Edward O’connor’s Formal Objection to advancing the HTML Image Description document along the REC track I do feel strongly compelled to address the use of the tired argument that I can summarize as “Because web developers misunderstand or misuse a feature, that means the feature must be bad”. In fact I first responded to this type of argument 6 years ago on the HTML5 mailing list in which I stated:

The notion that the decision to keep or eliminate an attribute based on whether it gets misused by authors is amazingly illogical. I would challenge the author to eliminate every element and attribute which is “widely misused” by authors.

For nearly a dozen years now, I’ve been employed in a capacity which gives me a day-to-day glimpse of how professional web developers are using markup. I see HTML abuse on a daily basis. Bad HTML permeates the web due to ignorant developers and is exacerbated by shitty UI frameworks and terrible “tutorials” by popular bloggers. In my years as an accessibility consultant I’ve reviewed work on Fortune 100 websites and many of the Alexa top 1000. I’ve reviewed web-based applications of the largest software companies in the world. The abuse of markup is ubiquitous.

  • I’m working with a client right now who has over 1600 issues logged in their issue tracking system just related to accessibility. Several dozen of those issues related to missing ‘name’ attributes on radio buttons.
  • Across 800,000 tested URLs, Tenon.io has logged an average of 42 accessibility issues per page. This number is statistically significant
  • The average length of an audit report by The Paciello Group is 74 pages long. I recently finished a report that was over 37,000 words long

Regardless of your position on longdesc, citing developer misuse is little more than a red herring.

I’m available for accessibility consulting, audits, VPATs, training, and accessible web development, email me directly at karl@karlgroves.com or call me at +1 443-875-7343

Video: Prioritizing Remediation of Accessibility Issues (from ID24)

The Paciello Group has recently uploaded all of the sessions from the Inclusive Design 24 event that was held on Global Accessibility Awareness Day. My session was titled “Prioritizing Remediation of Accessibility Issues” as described:

Once you have a report from an accessibility consultant, automated tool, or your QA team, now what? Not all issues are created equal. This session will discuss the various factors which must be weighed in order to make the most effective use of developer time and effort while also having the best possible results for your users.

Watch my video below, including repeated cameos by my mastiff, Poppy and take a look at the whole playlist

I’m available for accessibility consulting, audits, VPATs, training, and accessible web development, email me directly at karl@karlgroves.com or call me at +1 443-875-7343

Announcing the Viking & the Lumberjack

At CSUN 2014, Billy Gregory and I gave a presentation titled No Beard Required. Mobile Testing With the Viking & the Lumberjack. The presentation was an absolute disaster. Our approach to the presentation was to “wing it”, showing how to test with various mobile technologies. Thing is, none of the mobile technologies actually cooperated with us. The good news – for us at least – is that Bill and I were entertaining enough for Mile Paciello to have a crazy idea of his own: a web video series called, appropriately, the Viking and the Lumberjack. Today is the day we launch our first episode of (hopefully) many episodes where Billy Gregory both entertain and inform. We hope you enjoy!

I’m available for accessibility consulting, audits, VPATs, training, and accessible web development, email me directly at karl@karlgroves.com or call me at +1 443-875-7343

Video of my talk from Open Webcamp 2014

I’m available for accessibility consulting, audits, VPATs, training, and accessible web development, email me directly at karl@karlgroves.com or call me at +1 443-875-7343

[Part 1] The Newb’s Crash Course in Test Driven Development, including Git, Grunt, Bower, and Qunit

Since we launched the private beta of Tenon.io, the feedback has been really positive and, frankly, energizing. But we have more work to do before we’re ready to open the whole thing up for the public. Much of that work centers around tests. We need more tests. Right now, we have a backlog of about 65 tests to write. Some of those tests require additional utility methods so we can keep things DRY. As I was writing one such method, I thought it might be a good topic for an intro to what I call modern web development techniques. I covered this in my recent presentation at Open Webcamp, titled The new hotness: How we use Node, Phantom, Grunt, Bower, Chai, Wercker, and more to build and deploy the next generation of accessibility testing tools. (an obnoxiously long title, I know).

In this tutorial I’m going to go over the basics of starting a project, scaffolding out a project, and give a very quick intro to Test Driven Development. There are a ton of details, nuances, and considerations that I’m going to mostly gloss over, because this tutorial touches on a lot of things and each of them are worthy of multiple blog posts in and of themselves. There are a ton of links throughout the content of this post where you can find a lot more info on these various topics and I really encourage you to explore them.

The general principle of Test Driven Development is this: If you know your requirements, you know the acceptance criteria. Write tests (first) which test whether you’ve met the acceptance criteria. Write the code that passes the test. This approach has multiple benefits, especially when it comes to quality. If you write good tests and you’re passing those tests, then you’re avoiding bugs. Also, as new code is added, if the new code passes its own tests but causes prior tests to fail, then you avoid the new bugs as well. This assumes, of course, that you’re writing good tests. At Tenon, we’ve seen our own bugs arise from tests that didn’t take into consideration some edge case scenarios. In my opinion, this demonstrates the best part about TDD, because all we needed to do was add a new test fixture that matched the failed case, modify the code, checked to make sure we passed the test, and the bug was squashed.

Some background preparation

In this tutorial I’m only making a really tiny jQuery plugin, but we’re going to pretend it is an actual project.
Every single project I embark on has a local development environment and its own repository for version control. Over many years, I’ve learned the hard way that I can’t have a single dev server for everything I do and version control is critical. This is because chances are pretty high that I’ll eventually need to re-use, refactor, or expand on something, even if I consider it purely experimental at the time.

So, the first step for me is always to create the project and set up the version control. I use Git for version control and I use Bitbucket to host the repositories. I type these items in Terminal to get everything started:

mkdir /path/to/your/project
cd /path/to/your/project
git init
git remote add origin git@bitbucket.org:karlgroves/jquery-area.git

So, for the newbs: I’ve made the folder to hold the project using mkdir, I went to it using cd, I initialized the repository using git init and then I added the remote location using git remote add origin. The next step I often take is to set up the new host in MAMP but in this case I don’t need to since it is just a small jQuery plugin being written.

Every bit of code discussed in this tutorial can be found on Bitbucket at https://bitbucket.org/karlgroves/jquery-area. To download & use that code to follow along, do this:

mkdir /path/to/your/project
cd /path/to/your/project
git clone git@bitbucket.org:karlgroves/jquery-area.git

Every feature must be driven by a need

I’m a very strong proponent of Agile software development processes and a very strong believer in requirements driven by a user-oriented need, often referred to as a User Story. Good user stories follow the INVEST pattern. Once a User Story has been defined, it is broken down into the distinct tasks that need to be performed to complete the story. For most user stories, there are likely to be multiple tasks. For this tutorial our user story is simple:

As a test developer, I want to be able to create tests which check for an actionable object’s dimensions.

Given the above, we then need to determine what tasks must be performed in order to complete the story. Since we’re testing for an actionable object’s final dimensions – and because we use jQuery – we want to test the values returned for .innerHeight() and .innerWidth(). This is because border and margin aren’t part of the “hit area” for actionable items. We also want to determine the overall area of the object. So our task in this case is pretty simple:

Create a jQuery plugin that will calculate an object’s dimensions

We determined this to be a story with a single task because it only requires that. But we also determined that, down the road, we may need more than just actionable objects, so we’ll let it be used for any object. In reality this plugin will only work for HTML elements that can take up space. Some elements like <br> don’t take up any space, but we won’t be using this for them.

Set up the project

In reality, this sort of simple plugin doesn’t require its own project, but go with me here.

The first step, after creating a local folder and setting up the Git repository, is to “scaffold” out the project or, get the basic structure in place. One of the best ways out there for this is to use Yeoman. Depending on the nature of your project, there may already be an official Yeoman Generator for your type of project. In fact, Sindre Sohrus has already created one for jQuery plugins. No matter your approach, it makes sense to start out with a basic structure for your project.

I didn’t use the Yeoman generator for this project, mostly because I have my own set of preferences. The best approach, if I planned on making a habit of making jQuery plugins, would be to fork the Yeoman Generator and use it as a basis for my own. Either way, here’s how my structure winds up:

  • Folders
    • src – this is where the source file(s) go. For instance, in a big project involving multiple files, there may be several files which get concatenated or compiled (or both) later
    • dist – this is the final location of the files to be used later. For instance, a project like Bootstrap may have several files in ‘src’ which get concatenated and minified for distribution here in the ‘dist’ folder
    • test – this is where the unit test files go
  • Files in the project root. This holds many of the project related files such as configuration files, etc. Many of these files allow other developers involved in the project to work efficiently by setting up shared settings at the project level.
    • .bowerrc – this is a JSON formatted configuration file. There are a lot more interesting things you can do with this file, but all we’re going to do is tell it where our bower components are located.
    • .editorconfig – this is a file to be shared with other developers to share configuration settings for IDEs for things like linefeed styles, indentation styles, etc. This (helps) avoid silly arguments over things like tabs vs. spaces, character encodings, etc.
    • .gitattributes – this is another file allowing you to do some project-level configuration
    • .gitignore – this lets you establish some files to be ignored by git. You can even find tons of example .gitignore files
    • .jshintrc – One of the Grunt tasks we’ll be talking about is JSHint : “… a community-driven tool to detect errors and potential problems in JavaScript code and to enforce your team’s coding conventions.” The options for the JSHint task can either be put directly into your Gruntfile or into an external file like this one.
    • jscs.json – this is a configuration for a coding style enforcement tool called JSCS.
    • CONTRIBUTING.md – common convention for open source projects is to add this file to inform possible contributors how they can help and what they need to know.
    • LICENSE – another convention is to provide a file as part of the repository which explains the appropriate license type for the project.
    • README.md – finally, in terms of convention, is the README file which provides an overview of the project. The README file often includes a description of what the project is all about and how to use it.
    • jquery manifest file (area.jquery.json) – If you plan on publishing a jQuery plugin, you need to create a JSON-formatted package manifest file to describe your plugin
    • package.json – This JSON-formatted file allows you to describe your project according to the CommonJS package format and describes things like your dependencies and other descriptive information about the project
    • Gruntfile.js – This file allows you to define and configure the specific tasks you’ll be running via Grunt.

Task Automation via Grunt

As described above, we’re going to be using Grunt to automate tasks. To use Grunt you first need to install Node. Once you have node installed, all you need to do is install Grunt via the Node Package Manager (npm) like so:

npm install -g grunt-cli

If you were starting your project from scratch, you’d want to find the plugins you want and follow each one’s instructions to install. Usually the install requires little more than running:

npm install PLUGIN_NAME_HERE --save-dev

So, installing the Grunt JShint plugin would be:

npm install grunt-contrib-jshint --save-dev

For this tutorial, if you’ve cloned the repo for the jquery area plugin, run this instead:

npm install && bower install

This will install all of the dev dependencies for the Grunt tasks as well as installing the jQuery and QUnit files needed for testing.

Let’s back up a second: what is Grunt?

Grunt is a “JavaScript taskrunner”. The goal of Grunt is to facilitate the automation of developer tasks. I discussed automation in an earlier blog post. Like any other tool, the purpose is to allow us to either do things more efficiently or do things we could never be able to do in the first place:

As tools and technologies continue to evolve, mankind’s goals remain the same: make things easier, faster, better and make the previously impossible become possible.

There are some tasks that developers do over and over during their regular day-to-day work which are made far easier through automation. There are even some automation-related tasks developers do which can be further automated. In this regard, Grunt can be seen as a way to apply DRY even to human effort. I’m a huge fan of that idea.

The specific Grunt plugins we’ll be using are:

  • Load Grunt Tasks (load-grunt-tasks) – this lets us do some lazy loading of all of the Grunt tasks.
  • Time Grunt (time-grunt) – this will show how long each task takes. This can be pretty important when running a lot of tasks or a single task (like a bunch of unit tests) that takes a long time.
  • Clean (grunt-contrib-clean) – we’ll be using this one to simply clean out the ‘dist’ folder prior to adding the final compiled plugin
  • Watch (grunt-contrib-watch) – this is a hugely beneficial task for us, because it will allow us to automatically run specific tasks whenever new changes are saved. For instance, we can set it up so that whenever the plugin file is changed, it runs JSHint and JSCS on it.
  • JSHint (grunt-contrib-jshint) – This task does some syntax checking of JavaScript files to detect potential errors. This kind of task can help you avoid pretty silly bugs based on simple mistakes
  • JSON Lint (grunt-jsonlint) – A bit like JSHint, this does syntax checking on JSON files. For us, this specifically saves us from problems with our configuration files which would in-turn cause issues with our tasks running properly.
  • JSCS Checker (grunt-jscs-checker) – JavaScript Code Style Checker, or JSCS, allows us to enforce some coding style conventions for your project.
  • QUnit (grunt-contrib-qunit) – QUnit is the JavaScript unit testing framework we’ll be using.
  • Connect (grunt-contrib-connect) – This task sets up a connect server
  • Uglify (grunt-contrib-uglify) – This task will do code minification on our plugin file and place it in the ‘dist’ folder.

Our Workflow: How Grunt and Qunit come into play

In this scenario we’re going to have some ‘watch’ tasks that run while we’re developing, primarily to make sure we don’t make silly coding style mistakes. Along the way, we’ll do test-driven development: defining our acceptance criteria and coding to meet them. Grunt allows us to automate the performance of tasks that we, as developers, do repetitively. As I’ve said in other posts:

In any case where a capability exists which can replace or reduce human effort, it only makes sense to do so. In any case where we can avoid repetitious effort, we should seek to do so.

This is exactly where tools like Grunt and Gulp truly shine. Instead of repetitively saving the files, then running jshint, then jscs, then qunit, then minifying the source, then copying it over to our dist folder, we can avoid that tedium through automation. We can establish a series of tasks, configured to our preferences, to be automatically run while we work, thus increasing our efficiency and quality.

Up next in Part 2: Actual TDD

At 2200+ words already, we’ll have to reserve the discussion of the TDD process to Part 2. We’ll go through defining the tests, creating fixtures, and writing the code. Stay Tuned!

I’m available for accessibility consulting, audits, VPATs, training, and accessible web development, email me directly at karl@karlgroves.com or call me at +1 443-875-7343

What an incredible start to today

This morning I’m sitting here in bed as I do virtually every morning: Working on one of many programming projects. Sometimes they’re “official” projects and sometimes just experiments. The big difference today is that I’m in San Jose, CA for Open Web Camp. About 45 minutes ago I spoke briefly with my wife. As I put my phone down I noticed a missed call and voicemail from a number I didn’t recognize.

Usually when I get calls like that, they’re from people who want a VPAT written urgently. They had arrived on this site after Googling for “VPAT”, arrived on this post and then called me to tell me they need a VPAT written urgently. I send them over to Brian Landrigan over at The Paciello Group and that’s that. But this call was different:

“I’m not sure if this is the right number to call or not. Based on your voicemail greeting, you sound like you might be younger than the person I’m looking for, but I’m looking for a relative of Fred T. Groves…”

I hung up on the voicemail and called the number right back.

Fred Groves was my uncle. He was a US Marine who died on Iwo Jima in World War II. He was just a boy when he died. My grandfather had signed a special permission form for Fred to join the Marines at 17 and Fred was barely 18 when he died on Iwo Jima. In other words, he shouldn’t have even been at Iwo Jima if he had waited until he was 18. It tore my grandfather apart. Naturally I never got to meet Fred. In fact, my own father was only 4 years old when Fred died.

I spent a few moments on the phone with the guy who had called me. He must’ve been around my age and was calling on behalf of his father-in-law and proceeded to tell me about the stories his father-in-law had about Fred and how he tried to “look after” Fred because he was so young. He was there – literally there – at Fred’s last moments on this planet. To this day, nearly 70 years after Iwo Jima, this guy’s Father-in-law still talks about Fred.

It makes me both proud and sad to have had a relative that touched others so deeply and yet to have been lost so early.

Affirming the Consequent

Today I came across a post by Simon Harper titled Web Accessibility Evaluation Tools Only Produce a 60-70% Correctness which is essentially a response to my earlier critique of a seriously flawed academic paper. I submitted a response on Simon’s site, but I want to copy it here for my regular readers. One thing that specifically bothers me is why do the responses continue to dodge the specific challenges I raise? You cannot claim something without evidence and you cannot supply data for one thing and claim that it leads additional, wholly unrelated conclusions. So, here goes:


Good post, and thank you for the response. It is unfortunate, however, that you didn’t read or respond to what I wrote. It is also unfortunate that the paper’s authors have similarly chosen to not respond directly to my statements. The blanket response “well, just replicate it” is an attempt at dodging my response and my [specific] criticisms of the paper (which again, you admittedly haven’t read). Furthermore, there’s little use in attempting to perform the same experiments when the conclusions presented have fully nothing to do with the data.

You said:
“Web accessibility evaluation can be seen as a burden which can be alleviated by automated tools.”
Actually, they don’t say that.

“In this case the use of automated web accessibility evaluation tools is becoming increasingly widespread.”
No data is supplied for this at all.

“Many, who are not professional accessibility evaluators, are increasingly misunderstanding, or choosing to ignore, the advice of guidelines by missing out expert evaluation and/or user studies.”
No data is supplied for this at all.

“This is because they mistakenly believe that web accessibility evaluation tools can automatically check conformance against all success criteria.”
No data is supplied for this at all.

“This study shows that some of the most common tools might only achieve between 60 and 70% correctness in the results they return, and therefore makes the case that evaluation tools on their own are not enough.”

Of all the things you said, this is the only thing actually backed by the data from the paper. Literally everything else is a case of affirming the consequent.

The data that they do present is very compelling and matches my own experience. The significant amount of variation between the tools tested was pretty shocking as well, and once you get past the unproven, hyperbolic claims, it is very interesting.

If this paper’s authors were to gather and present actual data regarding usage patterns (re: the claim that “the use of automated web accessibility evaluation tools is becoming increasingly widespread”) then I wouldn’t be so critical. There is no question that the data needed to substantiate this and similar statements simply isn’t supplied.

Finally, I’d like to address the statement “evaluation tools on their own are not enough”. As I say in my blog post, this is so obvious that it is hardly worth mentioning. No legitimate tool vendor says this. I’ve been working as an accessibility consultant for a decade. I’ve worked for/ along/ or in competition with all of the major tool vendors and have never heard any of them say that using their tool alone is enough. Whether end users think this or not is another matter. Again, it’d be great if the paper’s authors had data to show this happening, since they claim that it is.

The implication from this paper is that because tools do not provide complete coverage, they should not be used. This is preposterous and, I believe, born from a lack of experience outside of accessibility and a lack of experience in a modern software development environment. Automated testing, ranging from things like basic static code linting, to unit testing, to automated penetration testing is the norm and for good reason: it helps increase quality. But ask *any* number of skilled developers whether “passing” a check by JSHint means their JavaScript is good and you’ll get a universal “No”. That doesn’t stop contrib-jshint from being the most downloaded Grunt plugin (http://gruntjs.com/plugins). Ask any security specialist whether using IBM’s Rational Security is enough to ensure a site is secure, and they’ll say “No”. That doesn’t diminish its usefulness as a *tool* in a mature security management program.

Perhaps what we need most in terms of avoiding an “over-reliance” on tools is for people to stop treating them like they’re all-or-nothing.

I’m available for accessibility consulting, audits, VPATs, training, and accessible web development, email me directly at karl@karlgroves.com or call me at +1 443-875-7343

Tutorial: Creating a PHP class to use with Tenon.io


Just wanna get the code? All of the code for this tutorial is available at an open repository on BitBucket

Tenon.io is an API that facilitates quick and easy JavaScript-aware accessibility testing. The API accepts a large number of request parameters that allow you to customize how Tenon does its testing and returns its results. Full documentation for client developers is available in a public repository on Bitbucket. As an API, getting Tenon to do your accessibility testing requires a little bit of work. Tenon users have to do relatively minimal work to submit their request and deal with the response. This blog post shows an example of how to do that with a simple PHP class and also provides a method of generating a CSV file of results.

Despite the fact that it is an API, you can create a simple app very easily. First thing, of course, is that you need a Tenon.io API key. Go to Tenon.io to get one. Right now, Tenon is in Private Beta. If you’re interested in getting started right away, email karl@tenon.io to get your key. The second thing is you need a PHP-enabled server with cURL. Most default installs of PHP on web hosts will have it. If not, installation is easy.

How to use this class

Using this class is super easy. In the code chunk below, we’re merely going to pass some variables to the class and get the response. This is not production-ready code. There are a lot of areas where this can be improved. Use this as a starting point, not an end point.

define('TENON_API_KEY', 'this is where you enter your api key');
define('TENON_API_URL', 'http://www.tenon.io/api/');
define('DEBUG', false);

$opts['key'] = TENON_API_KEY;
$opts['url'] = 'http://www.example.com'; // enter a real URL here, of course
$tenon = new tenon(TENON_API_URL, $opts);

Using the code chunk above, you now have a variable, $tenon->tenonResponse, formatted according to the Tenon response format (read the docs for full details.)

That’s it! From there, all you need to do is massage that JSON response into something useful for your purposes.

Let’s walk through a class that can help us do that.

Give it a name

First, create a file, called tenon.class.php. Then start your file like so.

class tenon

Declare some variables

Now, at the top of the file we want to declare some variables:

  • $url – this will be the URL to the Tenon.io API itself.
  • $opts – this will be an array of your request parameters
  • $tenonResponse – this will be populated by the JSON response from Tenon
  • $rspArray – this will be a multidimensional array of the decoded response.
    protected $url, $opts;
    public $tenonResponse, $rspArray;

Class Constructor

Time to get to our actual class methods. First up is our class constructor. Since constructors in PHP cannot return a value, we just set up some instance variables to be used by other methods. The arguments are the $url and $opts variables discussed above.

     * Class constructor
     * @param   string $url  the API url to post your request to
     * @param    array $opts options for the request
    public function __construct($url, $opts)
        $this->url = $url;
        $this->opts = $opts;
        $this->rspArray = null;

Submit your request to Tenon

Next up is the method that actually fires the request to the API. This function is nothing more than a wrapper around some cURL stuff. PHP’s functionality around cURL is excellent and makes it perfect for this type of purpose.

This method passes through our request parameters (from the $tenon->opts array) to the API as a POST request and returns a variable, $tenon->tenonResponse, populated with the JSON response from Tenon.

     * Submits the request to Tenon
     * @param   bool $printInfo whether or not to print the output from curl_getinfo (usually for debugging only)
     * @return    string    the results, formatted as JSON
    public function submit($printInfo = false)
        if (true === $printInfo) {
            echo '<h2>Options Passed To TenonTest</h2><pre><br>';
            echo '</pre>';

        //open connection
        $ch = curl_init();

        curl_setopt($ch, CURLOPT_URL, $this->url);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_POST, true);
        curl_setopt($ch, CURLOPT_FAILONERROR, true);
        curl_setopt($ch, CURLOPT_POSTFIELDS, $this->opts);

        //execute post and get results
        $result = curl_exec($ch);

        if (true === $printInfo) {
            echo 'ERROR INFO (if any): ' . curl_error($ch) . '<br>';
            echo '<h2>Curl Info </h2><pre><br>';
            echo '</pre>';

        //close connection

        //the test results
        $this->tenonResponse = $result;


Decode the response

From here, how you deal with the JSON is up to you. Most programming languages have ways to deal with JSON. PHP has some native functionality, albeit simple, to decode and encode JSON. Below, we use json_decode to turn the JSON into a multidimensional array. This gives us the $tenon->rspArray to use in other methods later.

     * @return mixed
    public function decodeResponse()
        if ((false !== $this->tenonResponse) && (!is_null($this->tenonResponse))) {
            $result = json_decode($this->tenonResponse, true);
            if (!is_null($result)) {
                $this->rspArray = $result;
            } else {
                return false;
        } else {
            return false;

Make some sense of booleans

Tenon returns some of its information as ‘1’ or ‘0’. We’re going to want that to be more useful for human consumption, so we convert those to ‘Yes’ and ‘No’. Because of some weirdness with json_decode and PHP’s loose typing, sometimes digits are actually strings, so that’s why we’re not using strict comparison.

     * @param $val
     * @return string
    public static function boolToString($val){
        if($val == '1'){
            return 'Yes';
            return 'No';

Create a summary

OK, now it is time to start doing something useful with the response array. The first thing we need is a summary of how our request went and the status of our document. This method creates a string of HTML showing the following details:

  • Your Request – Tenon echoes back your request to you. This section reports the request that Tenon uses, which may include items set to their defaults.
  • Response Summary – This section gives a summary of the response, such as response code, response type, execution time, and document size.
  • Global Stats – This section gives some high level stats on error rates across all tests run by Tenon. When compared against your document’s density (below), this is useful for getting an at-a-glance idea of your document’s accessibility
  • Density – Tenon calculates a statistic called ‘Density’ which is, basically, how many errors you have, compared to how big the document is. In other words how dense are the issues on the page?
  • Issue Counts – This section gives raw issue counts for your document
  • Issues By Level – This section provides issue counts according to WCAG Level
  • Client Script Errors – one of the things that may reduce the ability of Tenon to test your site is JavaScript errors and uncaught exceptions. A cool feature of Tenon is that it reports these to you.

     * @return mixed
    public function processResponseSummary()
        if ((false === $this->rspArray) || (is_null($this->rspArray))) {
            return false;

        $output = '';
        $output .= '<h2>Your Request</h2>';
        $output .= '<ul>';
        $output .= '<li>DocID: ' . $this->rspArray['request']['docID'] . '</li>';
        $output .= '<li>Certainty: ' . $this->rspArray['request']['certainty'] . '</li>';
        $output .= '<li>Level: ' . $this->rspArray['request']['level'] . '</li>';
        $output .= '<li>Priority: ' . $this->rspArray['request']['priority'] . '</li>';
        $output .= '<li>Importance: ' . $this->rspArray['request']['importance'] . '</li>';
        $output .= '<li>Report ID: ' . $this->rspArray['request']['reportID'] . '</li>';
        $output .= '<li>System ID: ' . $this->rspArray['request']['systemID'] . '</li>';
        $output .= '<li>User-Agent String: ' . $this->rspArray['request']['uaString'] . '</li>';
        $output .= '<li>URL: ' . $this->rspArray['request']['url'] . '</li>';
        $output .= '<li>Viewport: ' . $this->rspArray['request']['viewport']['width'] . ' x ' . $this->rspArray['request']['viewport']['height'] . '</li>';
        $output .= '<li>Fragment? ' . self::boolToString($this->rspArray['request']['fragment']) . '</li>';
        $output .= '<li>Store Results? ' . self::boolToString($this->rspArray['request']['store']) . '</li>';
        $output .= '</ul>';

        $output .= '<h2>Response</h2>';
        $output .= '<ul>';
        $output .= '<li>Document Size: ' . $this->rspArray['documentSize'] . ' bytes </li>';
        $output .= '<li>Response Code: ' . $this->rspArray['status'] . '</li>';
        $output .= '<li>Response Type: ' . $this->rspArray['message'] . '</li>';
        $output .= '<li>Response Time: ' . date("F j, Y, g:i a", strtotime($this->rspArray['responseTime'])) . '</li>';
        $output .= '<li>Response Execution Time: ' . $this->rspArray['responseExecTime'] . ' seconds</li>';
        $output .= '</ul>';

        $output .= '<h2>Global Stats</h2>';
        $output .= '<ul>';
        $output .= '<li>Global Density, overall: ' . $this->rspArray['globalStats']['allDensity'] . '</li>';
        $output .= '<li>Global Error Density: ' . $this->rspArray['globalStats']['errorDensity'] . '</li>';
        $output .= '<li>Global Warning Density: ' . $this->rspArray['globalStats']['warningDensity'] . '</li>';
        $output .= '</ul>';

        $output .= '<h3>Density</h3>';
        $output .= '<ul>';
        $output .= '<li>Overall Density: ' . $this->rspArray['resultSummary']['density']['allDensity'] . '%</li>';
        $output .= '<li>Error Density: ' . $this->rspArray['resultSummary']['density']['errorDensity'] . '%</li>';
        $output .= '<li>Warning Density: ' . $this->rspArray['resultSummary']['density']['warningDensity'] . '%</li>';
        $output .= '</ul>';

        $output .= '<h3>Issue Counts</h3>';
        $output .= '<ul>';
        $output .= '<li>Total Issues: ' . $this->rspArray['resultSummary']['issues']['totalIssues'] . '</li>';
        $output .= '<li>Total Errors: ' . $this->rspArray['resultSummary']['issues']['totalErrors'] . '</li>';
        $output .= '<li>Total Warnings: ' . $this->rspArray['resultSummary']['issues']['totalWarnings'] . '</li>';
        $output .= '</ul>';

        $output .= '<h3>Issues By WCAG Level</h3>';
        $output .= '<ul>';
        $output .= '<li>Level A: ' . $this->rspArray['resultSummary']['issuesByLevel']['A']['count'];
        $output .= ' (' . $this->rspArray['resultSummary']['issuesByLevel']['A']['pct'] . '%)</li>';
        $output .= '<li>Level AA: ' . $this->rspArray['resultSummary']['issuesByLevel']['AA']['count'];
        $output .= ' (' . $this->rspArray['resultSummary']['issuesByLevel']['AA']['pct'] . '%)</li>';
        $output .= '<li>Level AAA: ' . $this->rspArray['resultSummary']['issuesByLevel']['AAA']['count'];
        $output .= ' (' . $this->rspArray['resultSummary']['issueSummary']['AAA']['pct'] . '%)</li>';
        $output .= '</ul>';

        $output .= '<h3>Client Script Errors, if any</h3>';
        $output .= '<p>(Note: "NULL" or empty array here means there were no errors.)</p>';
        $output .= '<pre>' . var_export($this->rspArray['clientScriptErrors'], true) . '</pre>';

        return $output;

Output the issues

The most important part of Tenon is obviously the issues. The below method gets the issues and loops through them to print them out in a human-readable format. Each issue is presented to show what the issue is and where the issue is. For a full description of Tenon’s issue reports, read the Tenon.io Documentation

     * @return   string
    function processIssues()
        $issues = $this->rspArray['resultSet'];

        $count = count($issues);

        if ($count > 0) {
            $i = 0;
            for ($x = 0; $x < $count; $x++) {
                $output .= '<div class="issue">';
                $output .= '<div>' . $i .': ' . $issues[$x]['errorTitle'] . '</div>';
                $output .= '<div>' . $issues[$x]['errorDescription'] . '</div>';
                $output .= '<div><pre><code>' . trim($issues[$x]['errorSnippet']) . '</code></pre></div>';
                $output .= '<div>Line: ' . $issues[$x]['position']['line'] . '</div>';
                $output .= '<div>Column: ' . $issues[$x]['position']['column'] . '</div>';
                $output .= '<div>xPath: <pre><code>' . $issues[$x]['xpath'] . '</code></pre></div>';
                $output .= '<div>Certainty: ' . $issues[$x]['certainty'] . '</div>';
                $output .= '<div>Priority: ' . $issues[$x]['priority'] . '</div>';
                $output .= '<div>Best Practice: ' . $issues[$x]['resultTitle'] . '</div>';
                $output .= '<div>Reference: ' . $issues[$x]['ref'] . '</div>';
                $output .= '<div>Standards: ' . implode(', ', $issues[$x]['standards']) . '</div>';
                $output .= '<div>Issue Signature: ' . $issues[$x]['signature'] . '</div>';
                $output .= '<div>Test ID: ' . $issues[$x]['tID'] . '</div>';
                $output .= '<div>Best Practice ID: ' . $issues[$x]['bpID'] . '</div>';
                $output .= '</div>';
        return $output;

Full Usage Example

So now that we have the full class in place, let's put it all together. In the example below, we're taking our request parameters from a $_POST array, such as that which we'd get from a form submission.

define('TENON_API_KEY', 'this is where you enter your api key');
define('TENON_API_URL', 'http://www.tenon.io/api/');
define('DEBUG', false);

$expectedPost = array('src', 'url', 'level', 'certainty', 'priority',
    'docID', 'systemID', 'reportID', 'viewport',
    'uaString', 'importance', 'ref', 'importance',
    'fragment', 'store', 'csv');

foreach ($_POST AS $k => $v) {
    if (in_array($k, $expectedPost)) {
        if (strlen(trim($v)) > 0) {
            $opts[$k] = $v;

$opts['key'] = TENON_API_KEY;

$tenon = new tenon(TENON_API_URL, $opts);


if (false === $tenon->decodeResponse()) {
    $content = '<h1>Error</h1><p>No Response From Tenon API, or JSON malformed.</p>';
    $content .= '<pre>' . var_export($tenon->tenonResponse, true) . '</pre>';
} else {
    $summary = $tenon->processResponseSummary();
    $content .= '<h2>Issues</h2>';
    $content .= $tenon->processIssues();
    $content .= $tenon->rawResponse();
echo $content;

That's it! You now have an HTML output of Tenon's response summary and issue details!

Screw it, just gimme the issues

OK, what if you just want the issues and none of that output-to-HTML stuff? Getting the issues into a CSV file is ridiculously easy with PHP. Add this method to your PHP class:

     * @param $pathToFolder
     * @return bool
    public function writeResultsToCSV($pathToFolder)
        $url = $this->rspArray['request']['url'];
        $issues = $this->rspArray['resultSet'];
        $name = htmlspecialchars($this->rspArray['request']['docID']);
        $count = count($issues);

        if ($count < 1) {
            return false;

        for ($x = 0; $x < $count; $x++) {
            $rows[$x] = array(
                implode(', ', $issues[$x]['standards']),

        // Put a row of headers up on the beginning
        array_unshift($rows, array('URL', 'testID', 'Best Practice', 'Issue Title', 'Description',
            'WCAG SC', 'Issue Code', 'Line', 'Column', 'xPath', 'Certainty', 'Priority', 'Reference', 'Signature'));

        if (!file_exists($pathToFolder . $name . '.csv')) {
            $fp = fopen($pathToFolder . $name . '.csv', 'w');
            foreach ($rows as $fields) {
                fputcsv($fp, $fields);
            return true;
        return false;

Then all you need to do is call it like this:

define('TENON_API_KEY', 'this is where you'd enter your api key');
define('TENON_API_URL', 'http://www.tenon.io/api/');
define('DEBUG', false);
define('CSV_FILE_PATH', $_SERVER['DOCUMENT_ROOT'] . '/csv/');

$expectedPost = array('src', 'url', 'level', 'certainty', 'priority',
    'docID', 'systemID', 'reportID', 'viewport',
    'uaString', 'importance', 'ref', 'importance',
    'fragment', 'store', 'csv');

foreach ($_POST AS $k => $v) {
    if (in_array($k, $expectedPost)) {
        if (strlen(trim($v)) > 0) {
            $opts[$k] = $v;

$opts['key'] = TENON_API_KEY;

$tenon = new tenonTest(TENON_API_URL, $opts);


if (false === $tenon->decodeResponse()) {
    $content = '<h1>Error</h1><p>No Response From Tenon API, or JSON malformed.</p>';
    $content .= '<pre>' . var_export($tenon->tenonResponse, true) . '</pre>';
    echo $content;
} else {
    if(false !== $tenon->writeResultsToCSV(CSV_FILE_PATH)){
        echo 'CSV file written!';

Now what?

This blog post shows how easy it is to create a PHP implementation that will submit a request to Tenon, do some testing, and return results. We want to see what you can do with it. Register at Tenon.io and get started!

I'm available for accessibility consulting, audits, VPATs, training, and accessible web development, email me directly at karl@karlgroves.com or call me at +1 443-875-7343

Accessibility Consulting is Broken

I’ve had an epiphany. Accessibility Consulting, that process where a client hires us to go through their system, test it for accessibility issues, and submit a report to them, is fundamentally broken. My personal interpretation of our goal, as professionals, is to make money doing Good. Our advanced level of knowledge, skills, and experience can and should drive higher amounts of money while allowing us to do a greater amount of good, like a snowball of awesomeness.

The client hires the consultant to help ease pain. That pain may have an array of causes and may have a varying degree of acuteness, but it always has the same root cause: ignorance. Organizationally there is systemic ignorance surrounding the topic of accessibility. The client neither understands what accessibility is or how to manage it and that results in a failure to effectively modify their ICT development or procurement processes to ensure an accessible outcome. Recognizing this, they turn to the services of a consultant. In all likelihood the resulting engagement involves an audit of the client’s problematic ICT system(s), the deliverable for which is a report outlining the findings of this audit process.

No matter how detailed and no matter how perfect the guidance, the delivery of the report fails to ease the customer’s pain. It does not directly address either the symptoms or the cause of the disease. In fact, the more extensive and greater scope of the testing, the higher likelihood that the client will be paralyzed by the magnitude of what they’ve been told. This paralysis is often made worse in cases where other technical debt has been incurred through bad architectural choices and longstanding legacy front-end code.

During in-person training, I use the following story to illustrate this paralysis:

Two years ago, my wife and I decided to make a large number of renovations to our house:

  • Paint all 3 bedrooms, including ceilings
  • Paint the great room
  • Create custom closet shelving
  • Replace badly stained wood floors in hallway
  • Sand, stain, and refinish wood floors in all 3 bedrooms
  • New chair molding, baseboard moldings, and wood trim throughout the house.
  • Replace wood floor in the great room.
  • New stairs

Excited by the new beauty I envisioned for our house, I dove right in and started doing the necessary demolition. I ripped out all of the carpet that was covering the wood floors, removed all the old molding, and skillfully removed the wood floor in the hallway. Removing the wood floor in the hallway was quite easy. I even skillfully staggered the removal of the boards so that the new boards would blend in without looking like they were replaced. Then the paralysis started.

Demolition complete, I was faced with the exact extent of what was ahead of me. Everywhere I went in the house I was reminded of everything I had to do to finish the house – some of which I really had no clue what to do. For instance, properly installing the new wood floor so it blended into the existing wood floor was far beyond my existing skill set. It scared me and, as a dependency for so many other things on the list, I knew I had to do it but had no idea where to start. So I didn’t start on it. I kept a long list of excuses prepared for why I couldn’t do it.

In hindsight, the real reason I didn’t dive right in to start the work was because I viewed the work ahead of me as a single massive job: Renovate the house. It wasn’t until I changed my outlook on the work as being a series of small distinct tasks I could tackle. This is the same type of overwhelmed feeling clients tend to get when they’re delivered a huge accessibility audit report. The lower their existing willingness to address accessibility and/ or the higher their level of pain & distress (such as threat of litigation), the more likely and more severely they’ll feel paralyzed by the bad news in the report.

Ideally, the client would take the report, read it in its entirety, absorb the excellent guidance contained therein, and jump in with both feet to start fixing their broken system. The consultant feels that report is more than a report. It is a learning document. When the final spelling and grammar check is run in Microsoft Word and saved to its final version, the consultant proudly ships it to the client. Idealistically, the consultant thinks their masterful wording, illustrative screenshots, and skillful code examples will trigger a revolution in the management and development of more accessible solutions by the client. Unfortunately the more likely outcome is confusion. None of the client’s pain is alleviated. Their long-term effectiveness and success with accessibility is not improved and, at best, the report becomes the basis for a series of new issues in the client’s issue tracking system. In practice, the issue reports created by the client are often lacking an acceptable level of detail for the issues to be properly and expeditiously repaired, further reducing the usefulness of the report.

How do we ease the client’s pain and ensure the client is successful in improving their system(s) and reducing their risk? If the delivery of the audit report doesn’t do it and the client’s repurposing of the report’s content doesn’t do it, where does this leave us? How can we more effectively help ensure client success? By becoming directly involved in alleviating their pain. By becoming the client. Remember, the root cause of the client’s pain is ignorance. The more closely the consultant works with the client as an internal stakeholder, a subject matter expert, and a mentor throughout the development lifecycle, the more directly involved the consultant is in ensuring the client’s success. Integrated into the process as a member of the team, the consultant has direct access to help steer better process and practices. This is the exact opposite of what happens when simply throwing the report over the fence, as it were. The days of the comprehensive audit cum mammoth report should come to an end, replaced with actual guidance.

This guidance can take the form of generating and delivering internal use assets for procuring, designing, developing, and maintaining ICT systems or it can take the form of direct involvement in the development and QA processes. Let’s take, for example, the QA process since it so closely resembles the audit report process in spirit.

A client with even marginally mature development processes has some system for keeping track of bugs and feature requests. This may be as simple as a spreadsheet or as complex as a standalone system which keeps track of not only the issues and feature requests but also various additional metadata related to the issues and feature requests which assist in managing, tracking, and reporting. In any case where such a system doesn’t exist, the consultant’s first order of business should be to assist the client in choosing and standing up such a system. In either case, the consultant needs to have direct access to this system equivalent to that of any other team member.

It is in this issue tracking system where the consultant must do their work, if they’re to be effective in facilitating actual improvement of the client’s systems. Working within a robust issue tracking system, the consultant can immediately log the issues they find. This is where the QA and development staff does their work and it is appropriate that, as a part of the team, so should the consultant. Here, the consultant can log the issues they find and take part in the ensuing discussion among development staff regarding the nature, severity, and priority of the issue. Will it take a long time to fix? Will it be difficult? What are the dependencies? How will the user benefit? How will the business logic or presentation logic be affected? How can the repair be verified? These are among the many questions that the development staff might ask that require the input and collaboration of the consultant. They also require a level of discussion not available in a long, one-sided report, no matter how detailed. Direct access to and use of the client’s issue tracking facilitates this seamless collaboration.

Merely replacing the mammoth report deliverable with direct issue logging obviously isn’t enough to address systemic ignorance of accessibility, but rather eliminates a significant roadblock to accessibility in current systems. In practice, this is particularly true because the issues are seen only as a series of issues and not a series of learning experiences. Testing of new work will show this to be true as the consultant is likely to discover issues identical in nature to those they had already reported in the earlier test effort(s). As mentioned in the second paragraph, the root cause of customer pain is ignorance and, while short-term pain is more effectively addressed with direct issue logging, the long-term plan must aggressively address ignorance. This is the domain of training. All persons involved in the management, design, development, and content of ICT systems should be trained to understand the need for accessibility, the specific challenges in a person with disability’s use of ICT products & services, and how that client staff person’s role impacts the deployment of accessible ICT. Through this role-based training the client staff person’s ignorance can be systematically eliminated. In the case of training, the more permanence the training materials have, the better – up to, and including, LMS and/ or video based materials that can be used as part of an onboarding process for new employees.

Last among the mechanisms which should be employed by the consultant to address ignorance is the generation of internal assets to be used in the procurement, design, and development if ICT systems. This should include things like policies and procedures, style guides, checklists, and other job aids to be used by management and staff. These assets should serve as guidance and reference materials, success criteria, and performance measures whenever new ICT work is undertaken or existing ICT systems are improved.

The days of the monolithic accessibility audit report are numbered, as it is an outdated medium that fails to directly address the actual problems faced by the consultant’s clients. Clients, often driven by pain based in ignorance, want and deserve a more direct and proactive approach to solving the root causes of their pain. The proscriptive nature of an audit report should give way to the close involvement and leadership of a skilled consultant.

I’m available for accessibility consulting, audits, VPATs, training, and accessible web development, email me directly at karl@karlgroves.com or call me at +1 443-875-7343

Measuring the harm of flawed academic papers

For several years I’ve been interested in finding and reading academic work in the field of web accessibility. I have a very strong belief that the things we say regarding web accessibility must be based on a significant amount of rigor and I hold in higher esteem those who base their statements on fact rather than opinion and conjecture. Unfortunately I often find that much of the academic work in web accessibility to be deficient in many ways, likely caused by a lack of experiential knowledge of the professional web development environment. Web development practices change at such a lightning fast pace that even professional developers have trouble keeping up on what’s new. Academics who themselves aren’t really developers in the first place are likely to have even greater trouble in understanding not only the causes of accessibility issues in a web-based system but how to test for those causes. I deal specifically with those topics 8-10 hours a day and sometimes I still have to turn to coworkers for advice and collaboration.

The reason this matters is because out-of-date knowledge and experience leads to issues with the research methods being also out of date. The most obvious evidence of this is when web accessibility researchers perform automated testing with tools that are out of date and/ or technically incapable of testing the browser DOM. Testing the DOM is a vital feature for any accessibility testing tool, especially when used in academic research, because the DOM is what the end user actually experiences. It matters even more when studying accessibility because the DOM is interpreted by the accessibility APIs which pass information about content and controls to the assistive technology employed by the user. Performing research with a tool that does not test the DOM is like measuring temperature with a thermometer you know to be broken. You have no chance of being accurate.

Recently I’ve been reading a paper titled “Benchmarking Web Accessibility Evaluation Tools: Measuring the Harm of Sole Reliance on Automated Tests”. This compellingly titled paper fails to show any instances of “sole reliance” on automated tests and further it fails to demonstrate where such sole reliance caused actual “harm” to anyone or anything. Instead, the paper reads as if it was research performed to validate a pre-determined conclusion. In doing so, the paper’s authors missed an opportunity at a much more compelling discussion: the vast performance differences between well-known accessibility testing tools. The title alludes to this, saying “Benchmarking Web Accessibility Evaluation Tools” and then proceeds to instead focus on these ideas of “harm” and “sole reliance” while using bad results from bad tools as its evidence.

This point – that testing with automated tools only is bad – is so obvious that it almost seems unnecessary to mention. I’ve worked in accessibility and usability for a decade and many of those years were as an employee of companies with make automated testing tools. I’ve also developed my own such tools and count among my friends those who have also developed such tools. Not once do I recall the employees, owners, or developers of any such tools claiming that their automated testing product provides complete coverage. Training materials delivered by SSB BART Group and Deque Systems disclose clearly that automated testing is limited in its capability to provide coverage of all accessibility best practices. So, if “sole reliance” on automated testing is actually an issue, a better title for this paper would be “Measuring the Harm of Incomplete Testing Methodologies.” Instead, the reader is presented with what amounts to an either-or proposition by constant mention of the things that the automated tools couldn’t find vs. what human evaluators found. Thus the paper implies that either you use an automated tool and miss a bunch of stuff or you have an expert evaluate it and find everything.

This implication begins even in the first paragraph of the Introduction by stating:

The fact that webmasters put compliance logos on non-compliant websites may suggest that some step is skipped in the development process of accessible websites. We hypothesise that the existence of a large amount of pages with low accessibility levels, some of them pretending to be accessible, may indicate an over-reliance on automated tests.

Unfortunately, nowhere else in the paper is any data presented that suggests the above comments have any merit. The fact that “…webmasters put compliance logos on non-compliant websites” could mean the sites’ owners are liars. It could mean the site was at one time accessible but something changed to harm accessibility. It could mean the sites’ owners don’t know what accessibility means or how to measure it. In fact, it could mean almost anything. Without data to back it up, it really means nothing and is certainly no more likely to be evidence of “over-reliance on automated tests” as it is any of the other possibilities. Instead the reader is left with the implied claim that it is this “over-reliance on automated tests” that is the culprit.

Further unproved claims include:

With the advent of WCAG 2.0 the use of automated evaluation tools has become even more prevalent.

This claim is backed up by no data of any kind. The reader is given no data from surveys of web developers, no sales figures of tools, no download numbers of free tools, not even anecdotal evidence. Instead, it continues:

In the absence of expert evaluators, organizations increasingly rely on automated tools as the primary indicator of their stated level.

And again no data is supplied to substantiate this claim. In fact, my emperical data gained from dealing with over seven-doze clients over the last decade suggests that organizations often don’t do any testing of any kind, much less automated testing. These organizations also tend to lack any maturity of process regarding accessibility in general, much less accessible development, manual accessibility testing, or usability testing. My experience is that organizations don’t “do accessibility” in any meaningful way, automated or not.   The true smoking gun, as it were, for this so-called harm by “sole reliance” on automated testing could be made simply by supplying the reader with actual data surrounding the above claim. It is not supplied and there is no evidence that such data was even gathered.

Another issue with this paper is its nearly myopic discussion of accessibility as a topic concerned only with users who are blind. The most egregious example comes in the claim, referencing prior work (from 2005), that “Findings indicate that testing with screen readers is the most thorough, whilst automated testing is the least”. Later the paper states that during the expert evaluation process that, “If no agreement was reached among the three judges a legally blind expert user was consulted.” While this is follow by a claim that this person is also a web accessibility expert, the paper states that “This protocol goes further and establishes a debate between judges and last resort consultation with end users.” I don’t consider the experience of a single blind user to be the same as “users” and further do not consider it likely that this single expert user’s opinion would represent the broad range of other blind users much less all users with all disabilities. In the United States, the overall rate of disability for vision impairment and hearing impairment is roughly equal, while those with mobility impairments are more than double both of those combined. Cognitive disabilities account for a larger population than the previous three types combined. Clearly the opinion, however skilled, of  a single person who is blind is in no way useful as a means measuring the accessibility of a website for all users with disabilities.

Further problems with the expert evaluation have to do with the ad-hoc nature of the expert evaluation process:

The techniques used to assess the accessibility of each web page are diverse across judges: evaluation tools that diverge from the ones benchmarked (WAVE2), markup validators, browser extensions for developers (Firebug, Web Developer Toolbar, Web Accessibility Toolbar), screen readers (VoiceOver, NVDA) and evaluation tools based on simulation such as aDesigner[24]

The above passage betrays two rather significant flaws in both the paper itself and the evaluation process. The first is the rather humorous irony that some of the tools listed are by their nature automated testing tools. Both the WAVE web service and WAVE toolbar provide visual representation of automated test results for the page being tested. Markup validators are automated evaluation tools which access the code and automatically assess whether the markup itself is valid. In other words, the expert evaluation process used automated tools. In the process, the point is made that no skilled evaluator would solely rely on the results from automated tools. Adding to the irony, there is no discussion of any other evaluation methods other than testing with screen readers. This further adds to my argument that this paper has a myopic focus on blindness. The second and more important flaw is that there appears to have been no predefined methodology in place for their evaluation. Instead it appears to be assumed that either the reader will trust that the reviewers’ expertise speaks for itself or that a rigorous methodology is unnecessary. Regardless of why, the fact that the paper doesn’t supply a detailed description of the expert evaluation methodology is cause to question the accuracy and completeness of, at the very least, the results of such evaluation.

If the purpose of the paper is to evaluate what is found by machines measured against the results uncovered by expert evaluators, then it is critical that the human evaluation methods be disclosed in explicit detail. Based on the information provided, it would appear that the expert evaluation happened in a much more ad hoc fashion, with each expert performing testing in whatever fashion they deem fit. The problem with this practice is that regardless of the level of expertise of the evaluators, there will always be differences in what & how the testing was done. The importance of this cannot be overstated. This is a frequent topic of discussion at every accessibility consulting firm I’ve worked for.  The number and kind(s) of problems discovered can vary significantly depending upon who does the testing and the looser the consulting firm’s methodology (or lack thereof in some cases), the more variance in what is reported. In fact, at a previous job one client once remarked “I can tell who wrote which report just based on reading it”. This, to me, is a symptom of a test methodology that lacks rigor. On the upside paper does describe a seemingly collaborative workflow where the judges discuss the issues found, but this is still not the same as having and following a predefined rigorous methodology. The presence of a rigorous methodology of manual testing would be even further strengthened by the judges’ collaboration.

In this same section on Expert Evaluations, the paper states that “Dynamic content was tested conducting usability walkthroughs of the problematic functionalities…” and yet the process of conducting these “usability walkthroughs” was not discussed. The paper does not discuss how many participants (if any)  took part in these usability walkthroughs and does not disclose any details on any of the participants, their disabilities, their assistive technologies, and so on. Again, the reader is expected to assume this was performed with rigor.

Exacerbating the above, the paper does not provide any details on what the expert evaluation discovered.  Some of this data is posted at http://www.markelvigo.info /ds/bench12 but the data provided only discloses raw issue counts and not specific descriptions of what, where, and why the issues existed. There is also no discussion of the severity of the issues found. While I realize that listing this level of detail in the actual paper would be inappropriate, sharing the results of each tool and the results of each expert evaluator at the URL mentioned above would be helpful in validating the paper’s claims. In fact, the expert evaluation results are invalidated as a useful standard against which the tools are measured by stating that:

Even if experts perform better than tools, it should be noted that experts may also produce mistakes so the actual number of violations should be considered an estimation…

If the experts make mistakes and the likelihood of such mistakes is so high that “…the actual number of violations should be considered an estimation…” then the results discovered by these persons is in no way useful as a standard for the subsequent benchmarking of the tools. Remember, the purpose of this paper is to supply some form of benchmark. You can’t measure something against an inaccurate benchmark and expect reliable or useful data.

The description of the approach for automated testing does not disclose the specific versions of each tool used or the dates of the testing. The paper also does not disclose what level of experience the users of the tools had with the specific tools or what, if any, configuration settings were made to the tool(s). The tool version can, at times, be critical to the nature and quality of the results. For instance, Deque’s Worldspace contained changes in Version 5 that were significant enough to make a huge difference between the results of testing with it and its predecessor. Similarly, SSB BART Group’s AMP is on a seasonal release schedule which has in the past seen big differences in testing. Historically, automated testing tools are well-known for generating false positives. The more robust tools can be configured to avoid or diminish this but whether this was done is not known. Not disclosing details on the testing tools makes it difficult to verify the accuracy of the results. Were the results they found (or did not find) due to flaws in the tool(s), flaws in the configuration of the tools, or flawed use of the tools?  It isn’t possible to know whether any of these possible factors influenced the results without more details.

To that point, it also bears mentioning that some of the tools used in this paper do not test the DOM. Specifically, I’m aware that TAW, TotalValidator, and aChecker do not test the DOM. SortSite and Worldspace do test the DOM and it is alleged that latest versions of AMP does as well. This means that there is a built-in discrepancy between what the tools employed actually test. This discrepancy in what the tools test quite obviously leads to significant differences in the results delivered and, considering the importance of testing the DOM, calls into question the reason for including half of the tools in this study. On one hand it makes sense in this case to include popular tools no matter what, but on the other hand it seems that using tools that are known to be broken sets up a case for a pre-determined conclusion to the study. This skews the results and ensures that more issues are missed than should be.

The numerous flaws discussed above do not altogether make this paper worthless. The data gathered is very useful in providing a glimpse into the wide array of performance differences between automated testing tools. The issues I’ve discussed above certainly invalidate the paper’s claim that it was a “benchmark” study, but it is nonetheless compelling to see differences between each tool, especially in discussing that while a tool may out-perform its peers in one area, it may under-perform in other ways even more significantly. The findings paint a picture of an automated testing market where tool quality differs in wild and unpredictable ways which a non-expert customer may be unprepared to understand. Unfortunately the data that leads to these various stated conclusions isn’t exposed in a way that facilitates public review. As mentioned, some of this data is available at http://www.markelvigo.info byt/ds/bench12.  It is interesting to read the data on the very significant disparities between the tools and also sad that it has to be presented in a paper that is otherwise seriously flawed and obviously biased.

An unbiased academic study into the utility and best practices of automated testing is sorely needed in this field.  I’ve attempted my own personal stab at what can be tested and how and I stand by that information. I’ve attempted to discuss prioritizing remediation of accessibility issues. I’ve recommended a preferred order for different testing approaches. At the same time, none of this is the same as a formal academic inquiry into these topics.  We’ll never get there with academic papers that are clearly driven by bias for or against specific methodologies.


Markel Vigo has updated the URL I’ve cited at which you can find some of the data from the paper with a response to this blog post. Like him, I encourage you to read the paper. In his response, he says:

We do not share our stimuli used and data below by chance, we do it because in Science we seek the replicability of the results from our peers.

My comments throughout this blog post remain unchanged. The sharing of raw issue counts isn’t enough to validate the claims made in this paper. Specifically:

  1. There is no data to substantiate the claim of “sole reliance” on tools
  2. There is no data to substantiate the claim of “harm” done by the supposed sole reliance
  3. There is no data shared on the specific methodology used by the human evaluators
  4. There is no data shared regarding the exact nature and volume of issues found by human evaluators
  5. There is no data shared regarding the participants of the usability walkthroughs
  6. There is no data shared regarding the exact nature and volume of issues found by the usability walkthroughs
  7. There is no information shared regarding the version(s) of each tool and specific configuration settings of each
  8. There is no data shared regarding the exact nature and volume of issues found by each tool individually
  9. There is no data shared which explicitly marks the difference between exact issues found/ not found by each tool vs. human evaluators

It is impossible to reproduce this study without this information.

In his response, Markel states that this blog posts makes "…serious accusations of academic misconduct…". I have no interest in making any such accusations against any person. Throughout my life I’ve apparently grown the rare ability to separate the person from their work. I realize that my statement about this paper’s bias can be interpreted as a claim of academic misconduct, but that’s simply an avenue down which I will not travel. Markel Vigo has contributed quite a bit to the academic study of web accessibility and I wouldn’t dare accuse him or the other authors of misconduct of any kind. Nevertheless, the paper does read as though it were research aimed at a predetermined conclusion. Others are welcome to read the paper and disagree.

Finally, the response states:

Finally, the authors of the paper would like to clarify that we don’t have any conflict of interest with any tool vendor (in case the author of the blog is trying to cast doubt on our intentions).

Let me be clear to my readers: Nowhere in this blog post do I state or imply that there’s any conflict of interest with any tool vendor.

I’m available for accessibility consulting, audits, VPATs, training, and accessible web development, email me directly at karl@karlgroves.com or call me at +1 443-875-7343