Fundamental Analysis of XBRL Data: A Machine Learning Approach

Xi Chen,Yang Ha Cho,Baruch Itamar Lev,Yiwei Dou

doi:10.2139/ssrn.3741015

Abstract

We conduct a fundamental analysis of detailed financial information to predict earnings. Since 2012, all U.S. public companies must tag quantitative amounts in financial statements and footnotes of their 10-K reports using the eXtensible Business Reporting Language (XBRL). Leveraging machine learning methods, we combine the high-dimensional XBRL-tagged financial data into a summary measure for the direction of one-year-ahead earnings changes. The measure shows significant out-of-sample predictive power: the area under the curve ranging from 67.52 to 68.66 percent is significantly higher than that of a random guess, which is 50 percent. Hedge portfolios are formed based on this measure during 2015-2018. The annual size-adjusted returns to the hedge portfolios range from 5.02 to 9.74 percent. These returns survive after accounting for transaction costs and using the five-factor Fama and French (2015) model. Our measure and strategies outperform those of Ou and Penman (1989), who extract the summary measure from 65 accounting variables using logistic regressions. Additional analyses suggest that the outperformance stems from both nonlinear predictor interactions missed by regressions and the use of more detailed financial data.

Full Text