Abstract
Many financial fraud events have occurred over the past few decades. These events have led to massive losses for investors. Hence, government officials have begun to focus on the problem and have issued several decrees (acts) on financial fraud. Many scholars have explored the factors of financial fraud, and although their results performed well, most studies have not generated a set of useful rules to support auditors. Furthermore, data on financial statement fraud usually constitute an imbalanced class problem, and previous work minimally addresses this problem. Therefore, this study, based on the handling of missing values and imbalanced classes, builds a detecting model of financial statement fraud. First, it utilizes listwise and pairwise deletion to remove missing values. Second, it proposes three merged attribute selection methods and applies a nonlinear distance correlation to select important attributes. Third, it applies undersampling and oversampling to address the imbalanced classes. Finally, it uses rule-based classifiers to generate a set of useful rules. In practice, this study employs a list of fraudulent companies to collect data on financial statement fraud. We summarize the results as follows: (1) the pairwise deletion removes fewer records than does listwise removal in handling missing values; (2) the merged attribute selection (Com_I4) has the best performance on the four evaluation criteria; (3) the oversampling can enhance accuracy, and has the lowest type 1 and type 2 errors; (4) the random forest of Com_I4 can build the optimal model of financial statement fraud in the pairwise deletion and random oversampling; and (5) the results show that the ensemble learning (random forest) is a robust model in this study. Finally, these results in this study can be provided to practitioners, investors, and auditing personnel as references.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have