Abstract

Securities fraud is a common worldwide problem, resulting in serious negative consequences to securities market each year. Securities Regulatory Commission from various countries has also attached great importance to the detection and prevention of securities fraud activities. Securities fraud is also increasing due to the rapid expansion of securities market in China. In accomplishing the task of securities fraud detection, China Securities Regulatory Commission (CSRC) could be facilitated in their work by using a number of data mining techniques. In this paper, we investigate the usefulness of Logistic regression model, Neural Networks (NNs), Sequential minimal optimization (SMO), Radial Basis Function (RBF) networks, Bayesian networks and Grammar Based Genet- ic Programming (GBGP) in the classification of the real, large and latest China Corporate Securities Fraud (CCSF) database. The six data mining techniques are compared in terms of their performances. As a result, we found GBGP outperforms others. This paper describes the GBGP in detail in solving the CCSF problem. In addition, the Synthetic Minority Over-sampling Technique (SMOTE) is applied to generate synthetic minority class examples for the imbalanced CCSF dataset.

Highlights

  • In the US, financial analysts have been confirmed to contribute to corporate fraud detection

  • Six methods are employed in this study, which are Logistic regression, Neural Networks (NNs), Sequential minimal optimization (SMO), Radial basis function (RBF) networks, Bayesian networks and Grammar-Based Genetic Programming (GBGP)

  • True Positive (TP) rate is the true positive rate for fraudulent firms, which is calculated by Equation (3)

Read more

Summary

Introduction

In the US, financial analysts have been confirmed to contribute to corporate fraud detection. Effective external monitoring can increase investors’ confidence, which is crucial to the functioning of any capital market [1] It is important for China’s securities market, as corporate fraud can impede China’s economic development since it has serious consequences to stakeholders, employees and society [1]. (2014) Knowledge Discovering in Corporate Securities Fraud by Using Grammar Based Genetic Programming. [7] investigated enforcement actions from the viewpoint of the fraudulent firms rather than what factors lead up to fraud They found that many of these firms have problems with published financial statements and irregular reports, such as inflated profit, false statements and major failure to disclose information, which are the common problems identified by the CSRC. Since the result of the enforcement action is either yes or no (i.e. 1 or 0), it is more reasonable to use bivariate probit model as the learning method to analysis the data

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call