Consequences of arbitrary binning the midpoint category in survey data: an illustration with student satisfaction in the National Student Survey

Thomas V Pollet,Merim Bilalić,Lee Shepherd

doi:10.1080/03075079.2023.2284808

Abstract

ABSTRACT Arbitrary placing cut-offs in data, i.e. binning, is recognised as poor statistical practice. We explore the consequences of using arbitrary cut-offs in two large datasets, the National Student Survey (2019 and 2022). These are nationwide surveys aimed at capturing student satisfaction amongst UK undergraduates. For these survey data, it is common to group the responses to the question on student satisfaction on a five point Likert scale into ‘% satisfied’ based on two categories. These % satisfied are then further used in metrics. We examine the consequences of using three rather than two categories for the rankings of courses and institutions, as well as the consequences of excluding the midpoint from the calculations. Across all courses, grouping the midpoint with satisfied leads to a median shift of 8.40% and 11.41% in satisfaction for 2019 and 2022, respectively. Excluding the midpoint from the calculations leads to a median shift of 4.20% and 5.70% in satisfaction for 2019 and 2022, respectively. While the overall stability of the rankings is largely preserved, individual courses or institutions exhibit sizeable shifts. Depending on the analysis, the most extreme shifts for courses in rankings are between 13 and 79 ranks, for institutions between 24 and 416 ranks. Our analysis thus illustrates the potentially profound consequences of arbitrarily grouping categories for individual institutions and courses. We offer some recommendations on how this issue can be addressed but primarily we caution against the reliance on arbitrary grouping of response categories in survey data such as the NSS.

Full Text