Abstract

This paper describes the current state of natural language processing (NLP) as it applies to corporate reporting. We document dramatic increases in the quantity of verbal content that is an integral part of company reporting packages, as well as the evolution of text analytic approaches being employed to analyse this content. We provide intuitive descriptions of the leading analytic approaches applied in the academic accounting and finance literatures. This discussion includes key word searches and counts, attribute dictionaries, naïve Bayesian classification, cosine similarity, and latent Dirichlet allocation. We also discuss how increasing interest in NLP processing of the corporate reporting package could and should influence financial reporting regulation and note that textual analysis is currently more of an afterthought, if it is even considered. Opportunities for improving the usefulness of NLP processing are discussed, as well as possible impediments.

Highlights

  • Financial accountants and economists have traditionally relied on quantitative metrics derived from financial statements as a basis for decision making

  • There is increasing recognition that financial statement metrics provide limited insights either because they do not allow one to infer nuances that may be contained in verbal discussions of financial performance or because key aspects of organizational performance and value are not reflected in financial statement results in a timely manner

  • Increases are evident for both the financial statements component of U.K. reports and the narratives component where the number of items reported in the median table of contents increased by 50% over this period

Read more

Summary

Introduction

Financial accountants and economists have traditionally relied on quantitative metrics derived from financial statements as a basis for decision making. More recent work has started to employ mainstream NLP techniques including cosine similarity to measure document similarity, supervised machine learning to identify document content, and unsupervised learning methods to identify topic structure in individual documents and across a wider corpus These approaches have helped shed light on important associations between unstructured data and corporate actions. A large fraction of the content involves text (or verbal communication transcribed into text in the case of conference calls and management presentations), much of which incorporates quantitative information (Siano and Wysocki 2018) These data are classified as unstructured because the elements are not amenable to rapid automated retrieval in a consistent manner across entities and over time. The majority of these disclosure developments involve unstructured narrative commentary. This trend is compounded by reporting developments at the country- and market-level.

Case study
Generic benefits of NLP
Cosine similarity approaches
Findings
Numerical example
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call