Small Data Problems in Political Research: A Critical Replication Study

Hugo De Vos,Suzan Verberne

doi:10.21248/jlcl.35.2022.226

Abstract

In an often-cited 2019 paper on the use of machine learning in political research, Anastasopoulos & Whitford (A&W) propose a text classification method for tweets related to organizational reputation. The aim of their paper was to provide a 'guide to practice' for public administration scholars and practitioners on the use of machine learning. In the current paper we follow up on that work with a replication of A&W's experiments and additional analyses on model stability and the effects of preprocessing, both in relation to the small data size. We show that (1) the small data causes the classification model to be highly sensitive to variations in the random train-test split, and that (2) the applied preprocessing causes the data to be extremely sparse, with the majority of items in the data having at most two non-zero lexical features. With additional experiments in which we vary the steps of the preprocessing pipeline, we show that the small data size keeps causing problems, irrespective of the preprocessing choices. Based on our findings, we argue that A&W's conclusions regarding the automated classification of organizational reputation tweets -- either substantive or methodological -- can not be maintained and require a larger data set for training and more careful validation.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal for Language Technology and Computational Linguistics	Publication Date: Jul 1, 2022
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Small Data Problems in Political Research: A Critical Replication Study

Abstract

Talk to us

Similar Papers

More From: Journal for Language Technology and Computational Linguistics

Lead the way for us

Similar Papers

Neural network methods for diagnosing patient conditions from cardiopulmonary exercise testing data
Donald E Brown ... Arthur Weltman
BioData mining | VOL. 15
Donald E Brown, et. al.Donald E Brown ... Arthur Weltman
13 Aug 2022
BioData mining | VOL. 15

SAR Target Detection Based on Domain Adaptive Faster R-CNN with Small Training Data Size
Yuchen Guo ... Guoxin Lyu
Remote Sensing | VOL. 13
Yuchen Guo, et. al.Yuchen Guo ... Guoxin Lyu
20 Oct 2021
Remote Sensing | VOL. 13

Compilation of Parallel Data Access for Vector Processor in Radio Base Stations
Wei Chen ... Peng Hao
IEEE Embedded Systems Letters | VOL. 14
Wei Chen, et. al.Wei Chen ... Peng Hao
01 Mar 2022
IEEE Embedded Systems Letters | VOL. 14

Novel virtual sample generation using conditional GAN for developing soft sensor with small data
Qun-Xiong Zhu ... Yan-Lin He
Engineering Applications of Artificial Intelligence | VOL. 106
Qun-Xiong Zhu, et. al.Qun-Xiong Zhu ... Yan-Lin He
13 Oct 2021
Engineering Applications of Artificial Intelligence | VOL. 106

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Small Data Problems in Political Research: A Critical Replication Study

Abstract

Talk to us

Similar Papers

More From: Journal for Language Technology and Computational Linguistics