Abstract

Large-scale data from various research fields are not only heterogeneous and sparse but also difficult to store on a single machine. Expectile regression is a popular alternative for modeling heterogeneous data. In this paper, we devise a distributed optimization approach to SCAD and adaptive LASSO penalized expectile regression, where the observations are randomly partitioned across multiple machines. We construct a penalized communication-efficient surrogate loss (CSL) function. Computationally, our method based on the CSL function requires only the master machine to solve a regular M-estimation problem, while other worker machines compute the gradient of the loss function on local data. Our method matches the estimation error bound of the centralized method during consecutive rounds of communication. Under some mild assumptions, we establish the oracle properties of the SCAD and adaptive LASSO penalized expectile regression. We then develop a modified alternating direction method of multipliers (ADMM) algorithm for the implementation of the proposed estimator. A series of simulation studies are conducted to assess the finite-sample performance of the proposed estimator. Applications to an HIV study demonstrate the practicability of the proposed method.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call