Portfolio optimization requires, as an input, a description of the assets' joint return distribution. In estimating forward-looking distributional parameters from historical data, such as means, variances, and correlations, investors face a conundrum. Estimates with small measurement error require a long time-series, but a longer time-series necessarily contains less forward-looking information unless the distribution of returns is known to be static over time. We propose to investigate the usefulness of textual company-level materials in producing better estimates for firms' return distributions. Our preliminary investigations have already yielded success. Employing a novel {\em text regression}, we have been able to predict, out of sample, firm return volatility using the Management Discussion and Analysis section from annual 10-K reports (which contains forward-looking views of the management). Our text-based predictor does as well or better than recent historical return volatility, suggesting that one can supplement contemporary textual data with historical returns to better deduce the contemporaneous properties of firms' return distribution. Our project's goal is to investigate how to extend the analysis into out-of-sample predictions of correlations. The end objective is to combine textual and quantitative data to best predict the covariance structure of firms' returns. An immediate and important application is to generate mean-variance efficient portfolios that suffer less from the measurement error and backward-looking bias documented in the literature.
Read full abstract