Abstract

Regression analysis is often affected by high dimensionality, severe multicollinearity, and a large proportion of missing data. These problems may mask important relationships and even lead to biased conclusions. This paper proposes a novel computationally efficient method that integrates data imputation and variable selection to address these issues. More specifically, the proposed method incorporates a new multiple imputation algorithm based on matrix completion (Multiple Accelerated Inexact Soft-Impute), a more stable and accurate new randomized lasso method (Hybrid Random Lasso), and a consistent method to integrate a variable selection method with multiple imputation. Compared to existing methodologies, the proposed approach offers greater accuracy and consistency through mechanisms that enhances robustness against different missing data patterns and sampling variations. The method is applied to analyze the Asian American minority subgroup in the 2017 National Youth Risk Behavior Survey, where key risk factors related to the intention for suicide among Asian Americans are studied. Through simulations and real data analyses on various regression and classification settings, the proposed method demonstrates enhanced accuracy, consistency, and efficiency in both variable selection and prediction.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call