A Scalable Local Algorithm for Distributed Multivariate Regression

Kanishka Bhaduri,Hillol Kargupta

doi:10.1002/sam.10009

Abstract

AbstractThis paper offers a local distributed algorithm for multivariate regression in large peer‐to‐peer environments. The algorithm can be used for distributed inferencing, data compaction, data modeling and classification tasks in many emerging peer‐to‐peer applications for bioinformatics, astronomy, social networking, sensor networks and web mining. Computing a global regression model from data available at the different peer‐nodes using a traditional centralized algorithm for regression can be very costly and impractical because of the large number of data sources, the asynchronous nature of the peer‐to‐peer networks, and dynamic nature of the data/network. This paper proposes a two‐step approach to deal with this problem. First, it offers an efficient local distributed algorithm that monitors the “quality” of the current regression model. If the model is outdated, it uses this algorithm as a feedback mechanism for rebuilding the model. The local nature of the monitoring algorithm guarantees low monitoring cost. Experimental results presented in this paper strongly support the theoretical claims. © 2008 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 1: 000‐000, 2008

Full Text