Election forensics: Using machine learning and synthetic data for possible election anomaly detection.

Mali Zhang,Ines Levin,R Michael Alvarez,Haroldo V Ribeiro

doi:10.1371/journal.pone.0223950

Mali Zhang, Ines Levin + Show 2 more

Open Access

https://doi.org/10.1371/journal.pone.0223950

Copy DOI

Abstract

Assuring election integrity is essential for the legitimacy of elected representative democratic government. Until recently, other than in-person election observation, there have been few quantitative methods for determining the integrity of a democratic election. Here we present a machine learning methodology for identifying polling places at risk of election fraud and estimating the extent of potential electoral manipulation, using synthetic training data. We apply this methodology to mesa-level data from Argentina’s 2015 national elections.

Highlights

Assuring that an election was run in a free and fair manner has been the focus of a significant body of research
As we argue in this paper, we can incorporate into our machine learning models the knowledge that social scientists have accumulated about election regularities—and potential election fraud—to increase the likelihood that our models detect the types of anomalies in elections data that may be the result of manipulation or error
For supervised machine learning tools to be useful for election forensics, the analyst needs training data of some form

Summary

Introduction

Assuring that an election was run in a free and fair manner has been the focus of a significant body of research. We generate synthetic clean and at-risk data to train a supervised classification model that can be used on the actual election data to classify mesas into clean or at-risk categories.

Results

Conclusion