Empirical Analysis of Financial Statement Fraud of Listed Companies Based on Logistic Regression and Random Forest Algorithm

Xinchun Liu,Miaochao Chen

doi:10.1155/2021/9241338

Abstract

Financial supervision plays an important role in the construction of market economy, but financial data has the characteristics of being nonstationary and nonlinear and low signal-to-noise ratio, so an effective financial detection method is needed. In this paper, two machine learning algorithms, decision tree and random forest, are used to detect the company's financial data. Firstly, based on the financial data of 100 sample listed companies, this paper makes an empirical study on the fraud of financial statements of listed companies by using machine learning technology. Through the empirical analysis of logistic regression, gradient lifting decision tree, and random forest model, the preliminary results are obtained, and then the random forest model is used for secondary judgment. This paper constructs an efficient, accurate, and simple comprehensive application model of machine learning. The empirical results show that the comprehensive application model constructed in this paper has an accuracy of 96.58% in judging the abnormal financial data of listed companies. The paper puts forward an accurate and practical method for capital market participants to identify the fraud of financial statements of listed companies and has certain practical significance for investors and securities research institutions to deal with the fraud of financial statements.

Highlights

Data is considered to be the source of knowledge
Each decision tree classifier generated is a weak classifier generated by random sampling and training model from all training data sets and feature sets [11]. e results of random forest data classification are determined by the comprehensive voting of all or part of the decision trees
This paper uses the gradient boosting decision tree model to analyze and judge the financial index data of 100 sample companies. e whole sample is divided into training set and test set according to the ratio of 75 : 25. e idea of gradient lifting decision tree algorithm makes it have natural advantages

Summary

Introduction

Data is considered to be the source of knowledge. Massive data often contains a lot of valuable information [1], just as Walmart found that, after buying baby paper diapers, men usually buy beer to reward themselves, so they bundle the two products. In order to more accurately mine all kinds of valuable knowledge and information contained in massive data, the question is how to complete these tasks with the help of the power of machines [3] It has become one of the most important tasks for scientific researchers. Is paper uses the decision tree model and random forest (RF) model in supervised learning to detect the company’s financial violations. E reason is that the decision tree and random forest algorithm have certain advantages in the processing of financial data. E advantage of random forest algorithm is that it can produce unbiased estimation of the generalized error internally when building the forest It calculates the closeness in each case, which is very useful for data mining, detecting outliers, and visualizing data

Related Work

Decision Tree Algorithm and Random Forest Model Construction

Financial Data Detection Based on Decision Tree and Random Forest Algorithm

Score AUC