Abstract

Despite legislative attempts to curtail financial statement fraud, it continues unabated. This study makes a renewed attempt to aid in detecting this misconduct using linguistic analysis with data mining on narrative sections of annual reports/10-K form. Different from the features used in similar research, this paper extracts three distinct sets of features from a newly constructed corpus of narratives (408 annual reports/10-K, 6.5 million words) from fraud and non-fraud firms. Separately each of these three sets of features is put through a suite of classification algorithms, to determine classifier performance in this binary fraud/non-fraud discrimination task. From the results produced, there is a clear indication that the language deployed by management engaged in wilful falsification of firm performance is discernibly different from truth-tellers. For the first time, this new interdisciplinary research extracts features for readability at a much deeper level, attempts to draw out collocations using n-grams and measures tone using appropriate financial dictionaries. This linguistic analysis with machine learning-driven data mining approach to fraud detection could be used by auditors in assessing financial reporting of firms and early detection of possible misdemeanours.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.