Machine driven classification of open-ended responses (MDCOR): An analytic framework and no-code, free software application to classify longitudinal and cross-sectional text responses in survey and social media research

Manuel S González Canché

doi:10.1016/j.eswa.2022.119265

Abstract

Open-ended questions in survey research allow participants to respond freely using their own words. Because such questions offer the possibility of learning how or why respondents may have achieved a goal or behaved in certain ways, these responses can address some of the inherent limitations of quantitative research, which typically does not allow researchers to understand processes or reasons. But such knowledge-based benefits come at the cost of having to label text data into categories or codes to ease their comparison and reach meaningful understandings. Manually classifying open-ended responses is not only time consuming—potentially taking weeks or even months, depending on sample size—but also risks introducing human errors or inconsistencies that can reduce the contribution of these responses in strengthening our understandings. In this study, we discuss the unresolved issue of how to close open-ended responses as rigorously and efficiently as possible relying on machine learning and text classification techniques, without losing context nor the original voices of our research participants, and while leveraging the nuances that human reasoning brings to the qualitative and mixed methods analytic tables. To this end, we offer a rigorous, user-friendly, no-code, and cost-free software application that implements our mixed equal-status design analytic framework: machine driven classification of open-ended responses (MDCOR). To test the performance of MDCOR, we analyzed tens of thousands of open-ended responses from two different surveys—one publicly available and one federally-protected. In all instances, MDCOR consistently offered time-efficient and reliable results and even tested whether non-response was associated with respondents’ attributes. Among its multiple outputs, MDCOR allows researchers to access the fully classified responses that can then be used in traditional quantitative modeling. Since MDCOR runs locally, its versatility to handling cross-sectional and longitudinal responses, enables the analysis of a variety of data, from federally protected/restricted sources to the classification of social media posts. By removing manual classification burdens and computer programming expertise, MDCOR opens the possibility of efficiently and rigorously reaping the knowledge-based benefits of open-ended responses in survey research without losing or altering our participants’ voices. We offer access to the public data analyzed (https://cutt.ly/YNmBOAL or González Canché, 2022c) and the software (Mac version here https://cutt.ly/xv6nnuN, Windows version https://cutt.ly/Lv6nTLG, Code Ocean version at González Canché (2022f)) so that researchers can interact first-hand with MDCOR and start using this tool and analytic framework in their own studies (see also González Canché, 2022a, 2022b, 2022d, 2022e, for related no-code data science applications to analyze qualitative data dynamically).

Full Text