Abstract
ABSTRACTAs technology advances, the scale of data generated is growing exponentially, bringing huge challenges to data storage and computation. To facilitate the computational cost while maintaining model estimation accuracy, subdata selection become important. Conventional methods, such as LASSO and ridge regression, often focus on feature selection. In contrast, methods of subsampling aim at specifying data points to be extracted. However, these subsampling methods often overlook the full consideration of the role of the response variable and its relationship with predictor variables. In this work, we propose a so‐called filtering approach for model estimation (FAME) method to perform subsampling in combination with feature screening. Compared with existing methods, the proposed method can result in the subdata being smaller in size both in terms of the number of features and observations, and also the computational complexity does not increase. The proposed method can be extended to situations when the predictor is binary, the response is binary, or both are binary. The performance of FAME is evaluated in both numerical studies and real data examples.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
More From: Statistical Analysis and Data Mining: The ASA Data Science Journal
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.