ABSTRACT The System Usability Scale (SUS) is a short, survey-based approach used to determine the usability of a system from an end-user perspective once a prototype is available for assessment. Individual scores are gathered using a ten-question survey with the survey results reported in terms of central tendency (sample mean) as an estimate of the system’s usability (the SUS study score), and confidence intervals (CIs) on the sample mean are used to communicate uncertainty levels associated with this point estimate. When the number of individuals surveyed is large, the SUS study scores and accompanying confidence intervals relying upon the central limit theorem for support are appropriate. However, when only a small number of users are surveyed, reliance on the central limit theorem falls short, resulting in CIs that suffer from parameter bound violations and interval widths that confound mappings to adjective and other constructed scales. These shortcomings are especially pronounced when the underlying SUS score data is skewed, as it is in many instances. This paper introduces an empirically based remedy for such small-sample circumstances, proposing a set of decision rules that leverage either an extended bias-corrected accelerated (BCa) bootstrap confidence interval (Cl) or an empirical Bayesian credibility interval about the sample mean to restore and bolster subsequent Cl accuracy. Data from historical SUS assessments are used to highlight shortfalls in current practices and to demonstrate the improvements these alternate approaches offer while remaining statistically defensible. A freely available, online application is introduced and discussed that automates SUS analysis under these decision rules, thereby assisting usability practitioners in adopting the advocated approaches.
Read full abstract