Abstract

AbstractTo balance the explosive growth of data volume and limited budgets for computational resources, one of the popular methods is downscaling the data volume by subsampling a subdataset that inherits the relevant property of the full data. As an alternative to the mean regression model, the quantile regression model has been studied extensively when the data are independent and the data scale is medium. This article focuses on quantile regression with massive data where the sample size n (greater than in general) is extraordinarily large but the dimension d (smaller than 20 in general) is small. We first formulate the general subsampling procedure and establish the asymptotic property of the resultant estimator. Then, with the help of optimality criteria in experimental design, we derive two subsampling probabilities that are optimal in the sense of smallest asymptotic mean square error. Since the optimal subsampling probabilities depend on the full data estimator, we develop a two‐step optimal subsampling algorithm and study the consistency and asymptotic normality of the resultant estimator. The empirical performance of the optimal subsampling algorithm is evaluated with synthetic and real datasets.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.