Using Data to Measure Usability

UsabilityUsability and overall user experience with a software product is one of the most significant factors in user satisfaction. At G2 Crowd, as a part of the product review process we capture 7 different satisfaction metrics including usability.

UsabilityUsability and overall user experience with a software product is one of the most significant factors in user satisfaction. At G2 Crowd, as a part of the product review process we capture 7 different satisfaction metrics including usability. The other metrics, ease of doing business, maintenance, meets requirements, net promoter (NPS), setup and support, will be topics for future posts, for now though, we’ll focus on usability.
All G2 Crowd satisfaction scores are rated on a 1-7 scale, each with it’s own specific modifier, which for usability is 1=painful to 7=delightful. Usability is a specific software development term that was defined by an ISO standard in the late 1990’s and goes somewhat deeper than simple satisfaction though. The standard defines usability as the ability of a predefined user role to meet quantified objectives using the software in a specific use scenario. In the standard three criteria are established, efficiency, effectiveness and satisfaction. Efficiency and effectiveness could then, be measured in some reasonably objective way against a set of predefined requirements (“does the software do this task or process” and “does the software do this task or process in a specified time (or less)”. The “score” speaks to usefulness of the software, but is still missing that last component, satisfaction. While satisfaction is individual and not generally measurable in an objective way, there is still the capability to gauge the satisfaction of the user population by collecting feedback (I should say honest feedback) and averaging some satisfaction rating across the population. The problem is twofold, 1. since satisfaction is subjective it is subject to manipulation by external means like peer pressure, organizational pressure (fear), personal bias, etc. and 2. it is a critical component of software adoption, if satisfaction is low with a product, it’s likely that it won’t get used (or at least not used correctly) no matter what satisfaction is reported. Adoption then, is also an input to the question of usability. This more complex concept around usability is important to understand, even if your measure (or I should say, the measure I’m going to use here to talk about the usability of certain types of software)  is less complex or infers the other two criterial from a self-reported measure of usability as a form of “satisfaction”. Now actually we do collect data on implementability and adoption so I suppose I could build some more complex way to use all the inputs, but I’ll leave that to some future topic.
Usability, measured the way I will now show you, insulates to some extent from the problem of peer or organizational bias, so is a reasonable proxy even if it doesn’t take into account a pre established set of efficiency and effectiveness metrics. When an individual is asked to report on the term usability, they tend to infer efficiency and effectiveness as a part of the judgement of satisfaction anyway. Relative usability is useful for comparing products, categories of products and year over year changes in the perception of usability to each of those. There’s an additional data point that could provide some insight as well, the question of whether the software met the user’s requirements (which is really the same as “how effective is the software”). As I noted above, we do collect that data so could build a usability index of sorts. To do an analysis of usability and usability trends across various categories of software we can look at the data in a few ways. First though, here’s a table of the 3 satisfaction metrics, “meets requirements” (MR), “usability” (U) and an index created from the average of the 2 for 4 complete years and the partial 2016.
And in a form that’s a little more digestible and just using the index calculated column this time:
 From this you can start to see some trends by category. For example, in Finance and Accounting software there’s improvement for 3 years followed by a decline in usability in 2015 and continuing into 2016. For Collaboration and Productivity software the trend is quite different, it shows improvement every year, as does Vertical Industry software and to a lesser degree a few other categories. The most obvious drop in the usability index is in Supply Chain and Logistics and Design software.
Since finance and Accounting has a relatively unusual trend, at least in comparison to the rest of the chart, let’s pull it out and look at it in more detail. This chart shows all three measures, MR, U and the index for all 5 years:
In this view you can see that MR is relatively higher than U for 2012 and 2013, but is overtaken in 2014 then stays fairly close in the subsequent decline. At this point I could drill into the specific products in the category to understand if the trend is widespread or is being driven by a few large products (since I know that there are a few giants in there, that could easily be the case). Anyway, I wasn’t doing this post to specifically solve the usability issues with Finance and Accounting software (although that would get some attention I bet), but to show how the data could identify trends and help drill into the causes.
Here’s a different view, that speaks to usability in the aggregate, the averages of the scores for each category and a grand total, that is the average of all.
There are a few other facts that would be useful in an analysis, including:
  • The sample grows from small to very large over the course of the 5 years
  • The sample is different to a large extent every year (some reviews are repeat users of course, but that’s a small part of the sample overall)
  • The questions are consistent across the 5 years
  • The underlying products are not fixed, there could be new products in the later years and/or some older products from the early years might have been replaced or even went out of business
  • Because the data range for all the satisfaction questions tends towards the upper end and the sample size is large, the variation is mostly between 5 – 7. Just because the variation is small though, it doesn’t negate the ability to infer trends from that variation.
  • In one sense this trend for the data to fall in the upper 30% is counter intuitive. It’s almost tribal knowledge that disgruntled customers are more likely to write reviews than happy customers (or just neutral customers for that matter). In B2B this doesn’t seem to be the case. I suspect that it is partly like that because many reviews are captured because of outreach by G2 and by the vendors. I should also note though, that I’m not saying only happy customers are a part of that outreach. In fact that’s not really possible considering the sheer number of respondents and the fact that by far the lion’s share of the reviews come from our outreach and our organic traffic, not the vendors.

Overall the trends by category are the most relevant, since it compares similar products in a similar circumstance. We could segment this by company size, industry, geography, role, cloud / non-cloud and of course by product / company. I’ll pull out a few more data sets like this in the near future, and maybe drill into a category all the way to the products and with some more detail segmentation.