Abstract
Large-scale data presents great challenges to data analysis due to the limited computer storage capacity and the heterogeneous data structure. In this article, we propose a distributed expectile regression model to resolve the challenges of large-scale data by designing a surrogate loss function and using the Iterative Local Alternating Direction Method of the Multipliers (IL-ADMM) algorithm, which is developed for the calculation of the proposed estimator. To obtain nice performance only after fewer rounds of communications, the proposed method only needs to solve an M-estimation problem on the master machine while the other working machines only to compute the gradients based on local data. Moreover, we show the consistency and the asymptotic normality of the proposed estimator, and illustrate the efficient proof by numerical simulations and positive analysis on the superconductor data.
Highlights
In recent years, machine learning techniques based on large-scale data have been widely used in inspection systems, correct classification and Internet of Things, as well as in social networks and smart cities [1]–[3]
This means that the statistical analysis involving the entire data must deal data communication between different storages, which may slow down the calculation
Combining the idea of communication-efficient surrogate likelihood (CSL) with the ADMM algorithm, we develop an Iterative Local Alternating Direction Method of Multipliers (IL-ADMM) algorithm for the distributed estimator in the linear expectile regression model
Summary
Machine learning techniques based on large-scale data have been widely used in inspection systems, correct classification and Internet of Things, as well as in social networks and smart cities [1]–[3]. Another challenge arises from the numerical implementation of the proposed estimator To this end, we adopt the Alternating Direction Method of Multipliers (ADMM) algorithm [5], [27]–[28], which is competent to distributed convex optimization problems and large-scale statistical inference problems. Combining the idea of CSL with the ADMM algorithm, we develop an Iterative Local Alternating Direction Method of Multipliers (IL-ADMM) algorithm for the distributed estimator in the linear expectile regression model. Compared to these existing approaches, our proposed method can solve the expectile regression with distributed data immediately, and reduce the storage and transmission cost in distributed computing effectively. We remind of some concluding remarks, and offer some proofs in the Appendix
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have