Abstract

The COVID-19 pandemic has generated an unprecedented amount of epidemiological data. Yet, concerns regarding the validity and reliability of the information reported by health surveillance systems have emerged worldwide. In this paper, we develop a novel approach to evaluating data integrity by combining the Newcomb-Benford Law with outlier methods. We demonstrate the advantages of our framework using a case study from China. To ensure more robust findings, we employ multiple diagnostic procedures, including three conformity estimates, four goodness-of-fit tests, and two distance measures (Cook and Mahalanobis). To promote transparency, we have made all computational scripts publicly available. Our findings indicates a significant deviation in the distribution of new deaths from the theoretical expectations of Benford's Law. Importantly, these results remain accurate even when considering alternative model specifications and conducting various statistical tests. Furthermore, the procedures developed here are easily applicable in other areas of knowledge and can be scaled to assess data quality in both the public and private sectors.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call