On the Feasibility of Distributed Kernel Regression for Big Data

Chen Xu,Runze Li,Yongquan Zhang,Xindong Wu

doi:10.1109/tkde.2016.2594060

Abstract

In Big Data applications, massive datasets with huge numbers of observations are frequently encountered. To deal with such massive datasets, a divide-and-conquer scheme (e.g., MapReduce) is often used for the analysis of Big Data. With such a strategy, a large dataset (e.g., a centralized real database or a virtual database with distributed data sources) is first divided into smaller manageable segments; the final output is then aggregated from the individual outputs of the segments. Despite its popularity in practice, it remains largely unknown whether such a distributive strategy provides valid theoretical inferences to the original data. In this paper, we address this fundamental issue for the distributed kernel regression (DKR) problem, where the algorithmic feasibility is measured by the generalization performance of the resulting estimator. To justify DKR, a uniform convergence rate is needed for bounding the generalization error over the individual outputs, which brings new and challenging issues in the Big Data setup. Using a sample dependent kernel dictionary, we show that, with proper data segmentation, DKR leads to an estimator that is generalization consistent to the unknown regression function. This result theoretically justifies DKR and sheds light on more advanced distributive algorithms for processing Big Data. The promising performance of the method is supported by both simulation and real data examples.

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: IEEE Transactions on Knowledge and Data Engineering	Publication Date: Nov 1, 2016
Citations: 48	License type: publisher-specific, author manuscript

R Discovery Prime

R Discovery Prime

On the Feasibility of Distributed Kernel Regression for Big Data

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Knowledge and Data Engineering

Lead the way for us

Similar Papers

Explore Big Data Analytics Applications and Opportunities: A Review
Sharifah Mashita Syed-Mohamad ... Mohd Heikal Husin
Big Data and Cognitive Computing | VOL. 6
Sharifah Mashita Syed-Mohamad, et. al.Sharifah Mashita Syed-Mohamad ... Mohd Heikal Husin
14 Dec 2022
Big Data and Cognitive Computing | VOL. 6

Cloud computing and big data: Technologies and applications
Mostapha Zbakh ... Mohamed Bakhouya
Concurrency and Computation: Practice and Experience | VOL. 29
Mostapha Zbakh, et. al.Mostapha Zbakh ... Mohamed Bakhouya
29 Mar 2017
Concurrency and Computation: Practice and Experience | VOL. 29

VALIDATION OF DATA FOR USE IN CIVIL INFRASTRUCTURE BIG DATA APPLICATIONS
Su Taylor ... Connor O’Higgins
-
Su Taylor, et. al.Su Taylor ... Connor O’Higgins
12 Sep 2023
12 Sep 2023

Chapter 7 - Public Transportation Big Data Mining and Analysis
Xiaolei Ma ... Xi Chen
Data-Driven Solutions to Transportation Problems | VOL. -
Xiaolei Ma, et. al.Xiaolei Ma ... Xi Chen
07 Dec 2018
Data-Driven Solutions to Transportation Problems | VOL. -

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On the Feasibility of Distributed Kernel Regression for Big Data

Abstract

Talk to us

Similar Papers

More From: IEEE Transactions on Knowledge and Data Engineering