Abstract

Large-scale data presents great challenges to data analysis due to the limited computer storage capacity and the heterogeneous data structure. In this article, we propose a distributed expectile regression model to resolve the challenges of large-scale data by designing a surrogate loss function and using the Iterative Local Alternating Direction Method of the Multipliers (IL-ADMM) algorithm, which is developed for the calculation of the proposed estimator. To obtain nice performance only after fewer rounds of communications, the proposed method only needs to solve an M-estimation problem on the master machine while the other working machines only to compute the gradients based on local data. Moreover, we show the consistency and the asymptotic normality of the proposed estimator, and illustrate the efficient proof by numerical simulations and positive analysis on the superconductor data.

Highlights

  • In recent years, machine learning techniques based on large-scale data have been widely used in inspection systems, correct classification and Internet of Things, as well as in social networks and smart cities [1]–[3]

  • This means that the statistical analysis involving the entire data must deal data communication between different storages, which may slow down the calculation

  • Combining the idea of communication-efficient surrogate likelihood (CSL) with the ADMM algorithm, we develop an Iterative Local Alternating Direction Method of Multipliers (IL-ADMM) algorithm for the distributed estimator in the linear expectile regression model

Read more

Summary

INTRODUCTION

Machine learning techniques based on large-scale data have been widely used in inspection systems, correct classification and Internet of Things, as well as in social networks and smart cities [1]–[3]. Another challenge arises from the numerical implementation of the proposed estimator To this end, we adopt the Alternating Direction Method of Multipliers (ADMM) algorithm [5], [27]–[28], which is competent to distributed convex optimization problems and large-scale statistical inference problems. Combining the idea of CSL with the ADMM algorithm, we develop an Iterative Local Alternating Direction Method of Multipliers (IL-ADMM) algorithm for the distributed estimator in the linear expectile regression model. Compared to these existing approaches, our proposed method can solve the expectile regression with distributed data immediately, and reduce the storage and transmission cost in distributed computing effectively. We remind of some concluding remarks, and offer some proofs in the Appendix

EXPECTILE REGRESSION WITH DISTRIBUTED DATA
IL-ADMM ALGORITHM
ASYMPTOTIC ANALYSIS
NUMERICAL ANALYSIS
REAL DATA ANALYSIS
Findings
DISCUSSION
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call