Using Machine Learning Algorithms to Detect Election Fraud

Ines Levin,Julia Pomares,R Michael Alvarez

doi:10.1017/cbo9781316257340.012

Abstract

INTRODUCTION For more than a decade, increased scrutiny has been placed on the administration and integrity of democratic elections throughout the world (Levin and Alvarez 2012). The surge of interest in electoral integrity seems to be fueled by a number of different factors: an increase in the number of nations conducting elections, more concerns about election administration and voting technology, the increased use of social media, and a growing number of scholars throughout the world who are interested in the study of integrity and the possible manipulation of elections (Alvarez, Hall, and Hyde 2008). Although there are many ways that the integrity of elections can be assessed – for example, by studying the opinions of voters about their confidence in the conduct of elections (Alvarez, Atkeson, and Hall 2012) or through election monitoring (Bjornlund 2004; Hyde 2007, 2011; Kelley 2013) – many methodologists, statisticians and computer scientists have contributed to the new and growing literature on “election forensics”. This body of research involves the development of a growing suite of tools – some as simple as looking at the distributions of variables, such as turnout in an election, and others that use more complex multivariate statistical models – to sift through observational data from elections to detect anomalies or outliers as potential indicators for election fraud and manipulation (Levin et al. 2009; Alvarez et al. 2014). The literature on election forensics now has advanced a somewhat dizzying array of methods for detecting election anomalies, without providing guidance for when particular methods might best be utilized by analysts. That is, when is it best to look for anomalies in distributions of voter turnout? When should digit tests (such as Benford's Law) be applied? What about the use of regression models to detect outliers, either in single or multiple contests? How much statistical power do distributional tests have in common settings where you want to try to detect election outliers? These questions have motivated some of our recent research and have led us to consider the use of new techniques, such as machine learning, for the detection of election manipulation in nations like Venezuela (Alvarez et al. 2014). Machine learning procedures use statistical tools to find patterns in the data that reveal new and relevant information that may prove useful for performing an action or task.

Full Text