How informative is the text of securities complaints?

Adam B Badawi

doi:10.1093/jleo/ewad003

Abstract

Abstract Much of the research in law and finance reduces complex texts down to a handful of variables. Legal scholars have voiced concerns that this dimensionality reduction loses important detail that is embedded in legal text. This article assesses this critique by asking whether text analysis can capture meaningful predictive information. It does so by applying text analysis and machine learning to a corpus of private securities class action complaints that contains over 90 million words. This analysis produces three primary findings: (1) the best performing models predict outcomes with an accuracy rate of about 70%, which is higher than baseline rates; (2) a hybrid model that uses both text and nontext components performs better than either of these two components alone; and (3) the predictions made by the machine learning models are associated with substantial abnormal returns in the days after cases get filed.

Full Text