Abstract

Benford's law states that the distribution of the first digit different from 0 [first significant digit (FSD)] in many collections of numbers is not uniform. The aim of this study is to evaluate whether population-based cancer incidence rates follow Benford's law, and if this can be used in their data quality check process. We sampled 43 population-based cancer registry populations (CRPs) from the Cancer Incidence in 5 Continents-volume X (CI5-X). The distribution of cancer incidence rate FSD was evaluated overall, by sex, and by CRP. Several statistics, including Pearson's coefficient of correlation and distance measures, were applied to check the adherence to the Benford's law. In the whole dataset (146,590 incidence rates) and for each sex (70,722 male and 75,868 female incidence rates), the FSD distributions were Benford-like. The coefficient of correlation between observed and expected FSD distributions was extremely high (0.999), and the distance measures low. Considering single CRP (from 933 to 7,222 incidence rates), the results were in agreement with the Benford's law, and only a few CRPs showed possible discrepancies from it. This study demonstrated for the first time that cancer incidence rates follow Benford's law. This characteristic can be used as a new, simple, and objective tool in data quality evaluation. The analyzed data had been already checked for publication in CI5-X. Therefore, their quality was expected to be good. In fact, only for a few CRPs several statistics were consistent with possible violations.

Highlights

  • The Benford’s law [1], originally identified by Newcomb [2], states that in many numerical series the distribution of the first significant digits (FSDs) is not uniform

  • In the website of the CI5 volume X (CI5-X) [9], the data of the 290 population-based cancer registries included in the publication are available, detailed by all the 424 cancer registry populations (CRPs), as each cancer registry can provide information for the whole population and for different racial and/or ethnic subgroups within the same population

  • When considering all the cancer incidence rates together (146,590 observations), the distribution of the FSDs appeared to be positively skewed (0.84), with the mean (3.38) greater than the median (3.0). These values were close to those of the theoretical Benford’s distribution, as were the ratios between 1st vs. 9th, and between 1st vs. 2nd (1.8 vs. 1.7) FSD. These results let suppose that the FSD distribution of cancer incidence rates might adhere to the Benford’s pattern

Read more

Summary

Introduction

The Benford’s law [1], originally identified by Newcomb [2], states that in many numerical series the distribution of the first significant digits (FSDs) (the first non-zero digit on the left side of a number) is not uniform. Population-based cancer registries produce a great amount of numbers: the cancer incidence rates. The evaluation of their quality is rather complex, involving different aspects, and it is mainly based on the knowledge of the clinical, diagnostic, and therapeutic pathways of patients and on the process of data collection and registration [6, 7]. Benford’s law states that the distribution of the first digit different from 0 [first significant digit (FSD)] in many collections of numbers is not uniform. The aim of this study is to evaluate whether population-based cancer incidence rates follow Benford’s law, and if this can be used in their data quality check process

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call