Abstract

An empirical Benford's law which describes the probability of the appearance of certain first significant digits in many distributions taken from real life, is used to identify anomalies in various kinds of data. Our aim was to test Benford's law to assess the quality of mass preventive screening data on the example of bioelectrical impedance analysis (BIA) data from Moscow health centers. As was shown earlier, such a data is characterized by a high level of contamination by artificially generated and falsified data. A generated 2010–2019 database of BIA measurements contained 1361019 measurement records in the age range of the examined persons from 5 to 96 years. Application of the expert quality assessment algorithm, which was used as a reference for evaluation of the effectiveness of Benford analysis, revealed a high percentage of incorrect data (66.5 %) which was dominated by falsified data. To characterize the degree of the data compliance with Benford's law, the mean absolute deviations of the frequency distributions of the first and first two significant digits deviations from the proper values and chi-squared statistics for the tenth powers of the standardized resistance, reactance, and resistance index values were assessed for each health center. A significant correlation was observed between the data deviation from Benford's law and the percentage of incorrect data as provided by the expert quality assessment algorithm (ρmax = 0.66 and 0.62 for the mean absolute deviations and χ2 statistics, respectively, based on the resistance value and the first significant digit). It is suggested that deviation of the BIA data from Benford's law serves as a sufficient, but not a necessary, condition for their contamination. For those health centers, in which most of the incorrect data were represented by multiple measurements of the same person under the guise of different ones, the data were in good agreement with Benford's law. If the structure of incorrect data was dominated by measurements of the calibration block, software emulations of BIA measurements and outliers, then the use of Benford's law made it possible to effectively rank health centers by the level of data authenticity.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.