Abstract

Abstract How do we ensure a statewide voter registration database’s accuracy and integrity, especially when the database depends on aggregating decentralized, sub-state data with different list maintenance practices? We develop a Bayesian multivariate multilevel model to account for correlated patterns of change over time in multiple response variables, and label statewide anomalies using deviations from model predictions. We apply our model to California’s 22 million registered voters, using 25 snapshots from the 2020 presidential election. We estimate countywide change rates for multiple response variables such as changes in voter’s partisan affiliation and jointly model these changes. The model outperforms a simple interquartile range (IQR) detection when tested with synthetic data. This is a proof-of-concept that demonstrates the utility of the Bayesian methodology, as despite the heterogeneity in list maintenance practices, a principled, statistical approach is useful. At the county level, the total numbers of anomalies are positively correlated with the average election cost per registered voter between 2017 and 2019. Given the recent efforts to modernize and secure voter list maintenance procedures in the For the People Act of 2021, we argue that checking whether counties or municipalities are behaving similarly at the state level is also an essential step in ensuring electoral integrity.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call