Abstract
Purpose This paper aims to integrate the latent semantic features of annual report text with accounting indicators to construct a financial fraud identification model, and quantitatively analyze the impact of different corporate risks on financial fraud behavior in different industries, providing a reference for identifying financial fraud. Design/methodology/approach This paper obtains 3,860 corporate annual report samples and accounting indicators from 2001 to 2020 through crawlers and the CSMAR database as our experimental subjects. By integrating latent semantic features with accounting indicators and textual language features, a new indicator system group is constructed. Based on this indicator system group, multiple model identification effects are compared and a stacking-based enterprise financial fraud identification model is constructed. In addition, an econometric model is established to verify the impact of latent semantic features related to enterprises on corporate financial fraud. Findings The experimental results show that the constructed stacking-based enterprise financial fraud identification model performs better than other machine learning models and can effectively identify financial fraud. The econometric model established for the latent semantic information of annual reports explains the impact of different corporate trends on fraud behavior in different industries. Originality/value This paper combines the textual latent semantic features of annual reports with accounting indicators, expands the scope of data analysis, introduces the idea of ensemble learning, updates the financial fraud identification algorithm and constructs an econometric model for further analysis, providing a reference for financial fraud identification.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have