Abstract

The divide and conquer strategy, which breaks a massive data set into a se- ries of manageable data blocks, and then combines the independent results of data blocks to obtain a final decision, has been recognized as a state-of-the-art method to overcome challenges of massive data analysis. In this paper, we merge the divide and conquer strategy with local average regression methods to infer the regressive relationship of input-output pairs from a massive data set. After theoretically analyzing the pros and cons, we find that although the divide and conquer local average regression can reach the optimal learning rate, the restric- tion to the number of data blocks is a bit strong, which makes it only feasible for small number of data blocks. We then propose two variants to lessen (or remove) this restriction. Our results show that these variants can achieve the optimal learning rate with much milder restriction (or without such restriction). Extensive experimental studies are carried out to verify our theoretical assertions.

Highlights

  • Divide and conquer strategies have many applicable scenarios

  • The average mixture has been shown to be efficient and feasible for global modeling methods such as conditional maximum entropy model [17], kernel ridge regression [29, 15, 4], kernel-based gradient descent [16] and kernel-based spectral algorithms [3, 11]. Compared with these global modeling methods, local average regression (LAR) [12, 8, 25], such as the Nadaraya-Watson kernel (NWK) and k nearest neighbor (KNN) estimates, which is by definition a learning scheme that averages outputs whose corresponding inputs satisfy certain localization assumptions, is recognized in the literature [12] to possess lower computational burden and is widely used in image processing [24], recommendation system [2] and financial engineering [13]

  • We show that average mixture local average regression (AVM-LAR) can achieve the optimal learning rate of LAR on the whole data set under some strong restrictions on m, the number of data blocks

Read more

Summary

Local average regression

Let DN = {(Xi, Yi)}Ni=1 be the data set where Xi ∈ X ⊆ Rd is a explanatory variable and Yi ∈ [−M, M ] is the real-valued response for some 0 < M < ∞. Its value may depend on the data and the query point x. Two widely used examples of LAR are the Nadaraya-Watson kernel (NWK) and k nearest neighbor (KNN) estimates. (NWK estimate) Let K : X → R+ be a kernel function [12], and h > 0 be its localization parameter. In the NWK estimate, the localization parameter depends only on the size of data. We denote the weight of KNN as Wh,Xi instead of Wk,Xi for the sake of unity and h = x − X(k)(x) depends on the distribution of data and the query point x

Optimal learning rate of LAR
AVM-LAR
AVM-LAR with data-dependent parameters
Qualified AVM-LAR
Simulation 1
Simulation 2
Proofs
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.