Abstract

High-dimensional data are increasingly popular in various physical, biological and social disciplines. A common existing approach of repeatedly splitting data was suggested to address the overfitting problem in high-dimensional statistics, however it is computationally expensive in high dimensions. A computationally efficient data splitting method is proposed and referred to as Neighborhood-Based Cross Fitting (NBCF) double machine learning in causal inference for structural causal models with high-dimensional data. The proposed method deals well with the problem of post-selection bias in causal inference in the presence of high-dimensional confounding. It provides an equivalent basis in unbiased estimation as repeated data splitting, which is suggested to expand the complexity scope of function class by empirical process methods. Numerical simulation studies were conducted to demonstrate that the proposed neighborhood-based approach is not only more computationally efficient than the existing sample splitting methods, but also better in bias reduction compared with other existing methods. Under certain conditions, simulation results further showed that the proposed estimators are asymptotically unbiased and normally distributed, which allows construction of valid confidence intervals. The practical application of NBCF is illustrated with a real dataset.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.