Missing values handling for machine learning portfolios

Andrew Y Chen,Jack Mccoy

doi:10.1016/j.jfineco.2024.103815

Missing values handling for machine learning portfolios

Andrew Y Chen, Jack Mccoy

Open Access

https://doi.org/10.1016/j.jfineco.2024.103815

Copy DOI

Journal: Journal of Financial Economics

Publication Date: Mar 8, 2024

Affiliation: Federal Reserve Board of Governors, Columbia University

#Underlying Data Source #Cross-sectional Correlations + Show 8 more

Abstract
Full-Text PDF
Similar Papers

Abstract

We characterize the structure and origins of missingness for 159 cross-sectional return predictors and study missing value handling for portfolios constructed using machine learning. Simply imputing with cross-sectional means performs well compared to rigorous expectation-maximization methods. This stems from three facts about predictor data: (1) missingness occurs in large blocks organized by time, (2) cross-sectional correlations are small, and (3) missingness tends to occur in blocks organized by the underlying data source. As a result, observed data provide little information about missing data. Sophisticated imputations introduce estimation noise that can lead to underperformance if machine learning is not carefully applied.

Full Text