Feature Selection for Dimension Reduction of Financial Data for Detection of Financial Statement Frauds in Context to Indian Companies

Sonika Gupta,Sushil Kumar Mehta

doi:10.1177/0972150920928663

Abstract

The financial fraud detection problem involves analysis of the large financial datasets. Financial statement fraud detection process is concentrated on two major aspects: first, identification of the financial variables and ratios, also termed as features. Second, applying the data mining methods to classify the organizations into two broad categories: fraudulent and non-fraudulent organizations. If the input dataset contains large number of irrelevant and correlated features, the computational load of the machine learning technique increases and the effectiveness of the classification outcomes decreases. The feature selection process selects a subset of most significant attributes or variables that can be the representative of original data. This selected subset can help in learning the pattern in data at much less time and with accuracy, in order to produce useful information for decision-making. This article briefly states the methods applied in the prior studies for selecting the features for financial statement fraud detection. This article also presents an approach to feature selection using correlation-based filter selection methods in which feature selection is performed based on ensemble model, and tests the outcome of the approach by applying the mean ratio analysis on financial data of Indian companies.

Full Text