Abstract
The need to ensure the robustness of very large data sets produced by analytical measurement processes is increasing. This requires data screening techniques that can identify formatting or transcription errors in large data sets, that have undergone multiple data-handling and manipulation procedures. The empirical observation that the digits 1 to 9 are not equally likely to appear as the initial digit in multi-digit numbers is known as Benford's Law, and may provide a solution to this requirement. Several sets of data pertaining to the measured concentrations of pollutants in ambient air in the UK in 2004 have been analysed for their initial digit frequencies in order to assess the potential for the use of Benford's Law as a data screening, and authenticity-checking, tool for these types of analytical data sets. Benford's Law has been shown to be a robust top-level data screening tool provided that the numerical range of the data set being considered is four orders of magnitude or greater. It has been shown that small changes in the deviation of a data set from Benford's Law may indicate the introduction of errors during data processing. In this way, Benford's Law provides a sensitive technique for identifying data mishandling in large data sets.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.