Detecting Anomalies in Tax Revenues Using Benford's Law. The Case of Polish Adjustment
Detecting Anomalies in Tax Revenues Using Benford's Law. The Case of Polish Adjustment
- Research Article
6
- 10.1080/14772000.2016.1181683
- May 27, 2016
- Systematics and Biodiversity
Benford's phenomenological law gives the expected frequencies of the first significant digit (i.e., the leftmost non-zero digit) of any given series of numbers. According to this law, the frequency of 1 is higher than that of 2; this in turn appears more often than 3, and so on decreasing until 9. Similarly, Benford's law can also be applied to the first two significant digits (i.e., from 10 to 99), and so on. We applied Benford's law to sets of taxonomic data sets consisting of the number of taxa included in taxa of higher rank. We chose the angiosperms (Magnoliophyta) as a model case, because they are very diverse, are monophyletic, and a consensus on taxonomy of orders and families has been achieved (classification APG III), and we used as sets of data the number of species, genera, families, and orders. Only the number of species per family and per order are Benford's sets, but the remaining data sets do not obey Benford. Furthermore, in the case of the analysis of the first two significant digits of species per genus, the deviation from Benford was very large, but they fit to a power law. Given that the conformity to Benford's law is fulfilled for ‘natural' taxonomic categories of angiosperms (i.e., species and family), but not for those with more artificiality (genus), we speculate, ‘the more natural, the more Benford'.
- Research Article
1
- 10.2308/isys-10247
- Mar 1, 2012
- Journal of Information Systems
Book Review
- Research Article
5
- 10.1111/corg.12195
- Mar 2, 2017
- Corporate Governance: An International Review
Manuscript TypeEmpiricalResearch Question/IssueThis study applies the statistical properties of Benford's Law to CEO pay. Benford's “Law” states that in an unbiased dataset, the first digit values are usually unequally allocated when considering the logical expectations of equal distribution. In this study we question whether the striking empirical properties of Benford's Law could be used to analyze the negotiating power and preferences of CEOs. We argue that performance‐based or market‐determined compensations should follow Benford's Law, demonstrating no direct negotiation by the CEOs. Conversely, deviation from Benford's Law could reveal CEO negotiating power or even preference.Research Findings/InsightsOur analysis shows that market‐determined “Option Fair Value” (the dollar value of stock options when exercised) conforms closely to Benford's Law, as opposed to “Salary”, which is fully negotiated. “Bonus”, “Option Award”, and “Total Compensation” are generally also largely consistent with Benford's Law, but with some exceptions. We interpret these exceptions as negotiation by the CEOs. Surprisingly, we found that CEOs prefer to be paid in round figure values, especially “5”. We use Benford's Law to study the negotiating powers of CEOs vs. that of other executives. Finally, we compare the negotiating tactics of CEOs before and after SOX and analyze the impact of firm size on their compensation.Theoretical/Academic ImplicationsThis study introduces Benford's Law and its applications within the corporate governance literature.Practitioner/Policy ImplicationsThis method could be used by academics, industry and regulators to uncover compensation patterns within large business departments and/or organizations or even entire industry segments.
- Conference Article
2
- 10.1109/aiam50918.2020.00009
- Oct 1, 2020
This paper aims to research the applicability of Benford's Law in Chinese texts. Firstly, the Chinese corpus was collected and word segmentation was performed. The distributions of the first digit of frequency were calculated for words, low-frequency words and single characters respectively in Chinese texts, and the relative entropy (Kullback-Leibler distance) between the distributions and the general Benford's law. Secondly, the parameter value range of the Generalized Benford's law was researched, and in view of the limitation of Zipf's law that is only applicable to large amounts of data, we carried out a statistical analysis of small-scale data. Then, the experimental analysis of the probability of the first digit of the word frequency of the single character data was carried out to explore the applicability of the Generalized Benford's law for single-character data. Finally, the applicability of Benford's law was investigated for artificially modified corpus. The results show that the words and characters in Chinese texts conform to the Benford's law, and Benford's law overcomes the limitation of Zipf's law on the size of the data sets, and the Generalized Benford's law has the ability to discriminate the natural quality of the corpus, which has important practical significance for Chinese information processing.
- Research Article
58
- 10.1088/0143-0807/14/2/003
- Mar 1, 1993
- European Journal of Physics
Benford's law states that the first digits of a large body of naturally occurring numerical data in decimal form are not uniformly distributed but follow a logarithmic probability distribution. The values of radioactive decay half lives, which have been accumulated throughout the present century and vary over many orders of magnitude, afford an excellent opportunity to test the predictions of this law. To this end, we examine the frequency of occurrence of the first digits of both measured and calculated values of the half lives of 477 unhindered alpha decays and compare them with the predictions of Benford's law. Good agreement is found, and a similar distribution law for second digits is also considered.
- Research Article
4
- 10.2139/ssrn.2161632
- Jan 1, 2012
- SSRN Electronic Journal
Abstract: We investigate the effectiveness of Benford's Law through a digital analysis of the off-balance sheet account disclosures made by Turkish Banks during the 1990-2010. We found that off-balance sheet account disclosures of the fiscal term 1999 doesn't comply with Benford's Law. Our finding is consistent with the Turkish Banks' practices. We also provide evidence on Law of Anomalous Numbers. Our results indicate a link between economic policy and deviation from frequencies of Benford's Law.Keywords: Benford's Law, Digital analysis, Banking sector, Fraud investigationJEL Classification: G21, M42(ProQuest: ... denotes formulae omitted.)1. IntroductionThe purpose of this paper is to investigate effectiveness of Benford's Law in the financial reports of Turkish Banks. Digital Analysis based on the Benford's Law has been used to detect frauds and manipulations and it has been seen that Digital Analysis provide signal in revealing frauds and manipulations.Turkish banks have been chosen since creative accounting applications were frequently realized in the Turkish banking sector especially during the 1990-2001. The investigation has been focused on the off-balance sheet account disclosures because fraud and manipulation methods of Turkish Banks were typically required accounting in the off-balance sheet accounts. In order to accurate assess of Digital Analysis, off-balance sheet account disclosures have been studied for the twenty years period 1990-2010. Looking at twenty years period is required because certain terms provide different expectations regarding frauds and manipulations that are likely to be found in the financial reports.The rest of the paper is arranged as follows. Section 2 reviews literature relating to Benford's law and Digital Analysis. Section 3 describes fraud and manipulation techniques applied by Turkish Banks during the 1990-2001. Section 4 presents our research questions. The method is specified in Section 5. Results are presented and discussed in Section 6. Summary and conclusion are contained in section 7.2. Benford's Law And Digital AnalysisThe Digital Analysis is the comparison of the difference between the expected and observed frequencies of the digits. The difference between the expected and observed frequencies indicates that the data includes systematic error (Nigrini and Mittermaier, 1997). This systematic error may arise from the measurement methods (Hales, Sridharan, Radhakrishnan, Chakravorty and Siha, 2008) or from the frauds and manipulations in the accounting records (Nigrini, 1996).On the other hand, although the Digital Analysis reveals the systematic errors, this is not a final evidence for a fraud or manipulation. The Digital Analysis is a method that shows where to look in order to obtain the best result from the data (Nigrini, 1996; (Hales, Chakravorty and Sridharan, 2009). In other words, the Digital analysis is a method that reveals the doubtful data. This property of the Digital Analysis was first proposed by Varian (1972). According to Varian (1972), the fact that a data set complies with Benford Distribution does not confirm the realness and accurateness of that data set, however, the fact that a data set does not comply with Benford Distribution is enough to be suspicious about that data set.Benford's Law finding a wide application area in the social sciences started in 1881 and spread into a process that has been constituting up to the present day. The first study basis to the Benford's law was published by Simon Newcomb in the American Journal of Mathematics in 1881 with the headline Note on the Frequency of Use of the Different Digits in Natural Numbers. In this article, Newcomb, researched the probability of the 'digits from 1 to 9 being found in the first digit of any number, and explained the Frequency Law stating that these probabilities are not equal. According to Newcomb (1881), the probability of the digits from 1 to 9 being found in the first digit of any number reduces as the digit grows. …
- Research Article
4
- 10.1108/jaar-02-2021-0037
- Jan 12, 2022
- Journal of Applied Accounting Research
PurposeIn this paper, the authors examine the association between conditional conservatism and deviations of the first digits of financial statement items from what are expected by Benford's Law.Design/methodology/approachThis research uses data of companies listed on the London Stock Exchange. The authors measure deviations of first digits from Benford's Law following Amiram et al. (2015) and firm-year conditional conservatism following previous studies (Basu, 1997; Khan and Watts, 2009; García Lara et al., 2016). The authors use multiple regressions to provide evidence for their hypothesis.FindingsThe results show that conditional conservatism is positively associated with deviations from Benford's Law. The findings are robust across different measures of deviations and conditional conservatism. Also, the authors find that the relationship between deviations from Benford's Law and conditional conservatism is more pronounced for firms with debt issuance, and for leveraged firms facing financial distress. Next, the authors’ analyses confirm previous evidence by showing that the first digits of financial statement items of UK listed companies conform to Benford's Law at the firm-specific level and the market level, and deviations of income statements are larger than those of balance sheets and cash flow statements.Research limitations/implicationsThe research makes significant contributions to the literature. First, this is the first study that provides empirical evidence suggesting that conditional conservatism may be a source of deviations from Benford’s Law. Second, the authors provide evidence confirming previous US findings (e.g. Amiram et al., 2015) showing that the distributions of first digits of financial statement items of UK listed companies also conform to Benford's Law.Practical implicationsThe authors’ findings have implications for auditors. Auditors should be aware of “false positive” for material misstatements when using Benford's Law as a risk assessment procedure. While both conditional conservatism and earnings management are related to deviations from Benford's Law, conservatism-related biases could indicate less audit risks.Originality/valueThe authors provide new and original evidence suggesting that conditional conservatism is related to deviations from Benford's Law.
- Research Article
3
- 10.17537/2022.17.230
- Nov 5, 2022
- Mathematical Biology and Bioinformatics
An empirical Benford's law which describes the probability of the appearance of certain first significant digits in many distributions taken from real life, is used to identify anomalies in various kinds of data. Our aim was to test Benford's law to assess the quality of mass preventive screening data on the example of bioelectrical impedance analysis (BIA) data from Moscow health centers. As was shown earlier, such a data is characterized by a high level of contamination by artificially generated and falsified data. A generated 2010–2019 database of BIA measurements contained 1361019 measurement records in the age range of the examined persons from 5 to 96 years. Application of the expert quality assessment algorithm, which was used as a reference for evaluation of the effectiveness of Benford analysis, revealed a high percentage of incorrect data (66.5 %) which was dominated by falsified data. To characterize the degree of the data compliance with Benford's law, the mean absolute deviations of the frequency distributions of the first and first two significant digits deviations from the proper values and chi-squared statistics for the tenth powers of the standardized resistance, reactance, and resistance index values were assessed for each health center. A significant correlation was observed between the data deviation from Benford's law and the percentage of incorrect data as provided by the expert quality assessment algorithm (ρmax = 0.66 and 0.62 for the mean absolute deviations and χ2 statistics, respectively, based on the resistance value and the first significant digit). It is suggested that deviation of the BIA data from Benford's law serves as a sufficient, but not a necessary, condition for their contamination. For those health centers, in which most of the incorrect data were represented by multiple measurements of the same person under the guise of different ones, the data were in good agreement with Benford's law. If the structure of incorrect data was dominated by measurements of the calibration block, software emulations of BIA measurements and outliers, then the use of Benford's law made it possible to effectively rank health centers by the level of data authenticity.
- Conference Article
14
- 10.1117/12.855085
- Apr 30, 2010
With the tremendous growth and usage of digital images nowadays, the integrity and authenticity of digital content is becoming increasingly important, and a growing concern to many government and commercial sectors. Image Forensics, based on a passive statistical analysis of the image data only, is an alternative approach to the active embedding of data associated with Digital Watermarking. Benford's Law was first introduced to analyse the probability distribution of the 1st digit (1-9) numbers of natural data, and has since been applied to Accounting Forensics for detecting fraudulent income tax returns [9]. More recently, Benford's Law has been further applied to image processing and image forensics. For example, Fu et al. [5] proposed a Generalised Benford's Law technique for estimating the Quality Factor (QF) of JPEG compressed images. In our previous work, we proposed a framework incorporating the Generalised Benford's Law to accurately detect unknown JPEG compression rates of watermarked images in semi-fragile watermarking schemes. JPEG2000 (a relatively new image compression standard) offers higher compression rates and better image quality as compared to JPEG compression. In this paper, we propose the novel use of Benford's Law for estimating JPEG2000 compression for image forensics applications. By analysing the DWT coefficients and JPEG2000 compression on 1338 test images, the initial results indicate that the 1st digit probability of DWT coefficients follow the Benford's Law. The unknown JPEG2000 compression rates of the image can also be derived, and proved with the help of a divergence factor, which shows the deviation between the probabilities and Benford's Law. Based on 1338 test images, the mean divergence for DWT coefficients is approximately 0.0016, which is lower than DCT coefficients at 0.0034. However, the mean divergence for JPEG2000 images compression rate at 0.1 is 0.0108, which is much higher than uncompressed DWT coefficients. This result clearly indicates a presence of compression in the image. Moreover, we compare the results of 1st digit probability and divergence among JPEG2000 compression rates at 0.1, 0.3, 0.5 and 0.9. The initial results show that the expected difference among them could be used for further analysis to estimate the unknown JPEG2000 compression rates.
- Research Article
4
- 10.1029/2024jf007691
- Sep 1, 2024
- Journal of Geophysical Research: Earth Surface
Seismic instruments placed outside of spatially extensive hazard zones can be used to rapidly sense a range of mass movements. However, it remains challenging to automatically detect specific events of interest. Benford's law, which states that the first non‐zero digit of given data sets follows a specific probability distribution, can provide a computationally cheap approach to identifying anomalies in large data sets and potentially be used for event detection. Here, we select vertical component seismograms to derive the first digit distribution. The seismic signals generated by debris flows follow Benford's law, while those generated by ambient noise do not. We propose the physical and mathematical explanations for the occurrence of Benford's law in debris flows. Our finding of limited seismic data from landslides, lahars, bedload transports, and glacial lake outburst floods indicates that these events may follow Benford's Law, whereas rockfalls do not. Focusing on debris flows in the Illgraben, Switzerland, our Benford's law‐based detector is comparable to an existing random forest model that was trained on 70 features and six seismic stations. Achieving a similar result based on Benford's law requires only 12 features and single station data. We suggest that Benford's law is a computationally cheap, novel technique that offers an alternative for event recognition and potentially for real‐time warnings.
- Conference Article
1
- 10.1145/337180.337629
- Jan 1, 2000
This distribution is counter-intuitive for at least two reasons. First it would seem “obvious” that the numbers drawn from a list generated from widely different arbitrary processes would have roughly equally probabilities for the digits 1 and 9 to be first digits. This is not normally the case. If the list of numbers does not have artificial limits, or include invented numbers such as postal codes, then approximately 30% of the numbers will have 1 as their first digit, but only 5% will have 9 as their first digit. Deviations from the expected Benford Distribution indicate the presence of some special characteristic of the data. The second, more theoretically challenging, problem is: What is the underlying property associated with so many widely different processes which generates lists of numbers that follow Benford's Law?We have conducted an empirical investigation to determine under what circumstances various software metrics follow Benford's Law, and whether any special characteristics, or irregularities, in the data can be uncovered if the data are found not to follow the law. The more tricky problem of understanding why the list of metrics might follow Benford's Law is left to another study. Lists were form from three software metrics extracted from 100 public domain industrial Java Projects. These metrics were Lines of Code (LOC), Fan-Out (FO) and McCabe Cyclomatic Complexity (MCC). Given that a Benford's Law analysis requires a list of considerable length, the data were divided into two groups. The first groups was from projects containing more than 100 files. This was intended as the “control group” and what was expected to follow Benford's Law if that Law was applicable for the analysis of software engineering metrics. To study the sensitivity of the digital analysis technique to project size, projects with a smaller number of files were compared to the control group.The empirical results indicate that the first digits of numbers in lists of LOC metrics extracted from the projects closely followed the probabilities predicted by Benford's Law than an “equal probability of occurrence” suggested by intuitive reasoning. This was shown using both qualitative and quantitative measures. The FO and MCC metrics did not follow the standard Benford's Law as well as did the LOC metrics. This is because the FO and MCC lists contain a significant number of numbers less than 10 and follow a different first digit distribution. Further investigation of the digital analysis technique is necessary to evaluate the applicability of Benford's Law in the total context of Software Metrics.
- Conference Article
9
- 10.1109/iscas.2011.5938152
- May 1, 2011
Whilst it is sometimes essential that a scene is well lit before image capture, too much light can cause exposure or glare-based problems. Typically, glare is introduced to images when the camera is pointed towards the light source, and results in a visible distortion in the image. In this paper, we analyse and identify images that contain the `glare' property using the empirical Benford's Law. The experiment is performed on 1338 images, and extracts discrete wavelet High High (HH), High Low (HL) and Low High (LH) sub bands as raw data. The significant digit from each coefficient of all sub bands is then calculated. We then analyse the probability of occurrence of large digits against smaller digits to detect anomalies. All images containing these anomalies are further analysed for the identification of additional salient features. This analysis is performed in accordance with the Benford's Law plot and the help of probability intensity histogram and divergence. Our results indicate that 142 images have irregular Benford's Law curves. For most images, the irregularity occurs at the $5^{th}$ digit. After visual examination, we have found the unbalanced light and high level of brightness in these images. To measure the intensity of light in an image, we compute the probability histogram of gray levels. These results also correlate with the irregular peak identified from the Benford's Law curves. In addition, the divergence is then computed, which shows the deviation between the actual Benford's Law curve and the Benford's Law graph of an image. Our proposed technique is novel and has a potential to be an image forensic tool for quick image analysis.
- Front Matter
12
- 10.1517/17460441.2013.740007
- Nov 3, 2012
- Expert Opinion on Drug Discovery
The ever-increasing rate of drug discovery data has complicated data analysis and potentially compromised data quality due to factors such as data handling errors. Parallel to this concern is the rise in blatant scientific misconduct. Combined, these problems highlight the importance of developing a method that can be used to systematically assess data quality. Benford's law has been used to discover data manipulation and data fabrication in various fields. In the authors' previous studies, it was demonstrated that the distribution of the corresponding activity and solubility data followed Benford's law distribution. It was also shown that too intense a selection of training data sets of regression model can disrupt Benford's law. Here, the authors present the application of Benford's law to a wider range of drug discovery data such as microarray and sequence data. They also suggest that Benford's law could also be applied to model building and reliability for structure-activity relationship study. Finally, the authors propose a protocol based on Benford's law which will provide researchers with an efficient method for data quality assessment. However, multifaceted quality control such as combinatorial use with data visualization may also be needed to further improve its reliability.
- Book Chapter
- 10.23943/princeton/9780691147611.003.0004
- May 26, 2015
This chapter switches from the traditional analysis of Benford's law using data sets to a search for probability distributions that obey Benford's law. It begins by briefly discussing the origins of Benford's law through the independent efforts of Simon Newcomb (1835–1909) and Frank Benford, Jr. (1883–1948), both of whom made their discoveries through empirical data. Although Benford's law applies to a wide variety of data sets, none of the popular parametric distributions, such as the exponential and normal distributions, agree exactly with Benford's law. The chapter thus highlights the failures of several of these well-known probability distributions in conforming to Benford's law, considers what types of probability distributions might produce data that obey Benford's law, and looks at some of the geometry associated with these probability distributions.
- Research Article
- 10.1371/journal.pone.0291337
- Sep 14, 2023
- PLOS ONE
Benford's Law states that, in many real-world data sets, the frequency of numbers' first digits is predicted by the formula log(1 + (1/d)). Numbers beginning with a 1 occur roughly 30% of the time, and are six times more common than numbers beginning with a 9. We show that Benford's Law applies to the the frequency rank of words in English, German, French, Spanish, and Italian. We calculated the frequency rank of words in the Google Ngram Viewer corpora. Then, using the first significant digit of the frequency rank, we found the FSD distribution adhered to the expected Benford's Law distribution. Over a series of additional corpora from sources ranging from news to books to social media and across the languages studied, we consistently found adherence to Benford's Law. Furthermore, at the user-level on social media, we found Benford's Law holds for the vast majority of users' collected posts and significant deviations from Benford's Law tends to be a mark of spam bots.
- Ask R Discovery
- Chat PDF
AI summaries and top papers from 250M+ research sources.