Abstract

Double machine learning (DML) is becoming an increasingly popular tool for automated model selection in high-dimensional settings. These approaches rely on the assumption of conditional independence, which may not hold in big-data settings where the covariate space is large. This paper shows that DML is very sensitive to the inclusion of even a few “bad controls” in the covariate space. The resulting bias varies with the nature of the causal model, which raises concerns about the feasibility of selecting control variables in a data-driven way.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call