Distributed classification in large-scale P2P networks has gained relevance in recent years and support applications like distributed intrusion detection in P2P monitoring environments, online match-making, personalized information retrieval, distributed document classification in a P2P media repository and P2P recommender systems to mention a few. However, classification in a P2P network is a challenging task due to the constraints such as centralization of data is not feasible, scarce communication bandwidth, scalability, synchronization and peer dynamism. Moreover, without considering data distributions and topological scenarios of real world P2P systems, most of the existing distributed classification approaches lack in their predictive and network-cost performance. In this paper, we investigate a collaborative classification method (TRedSVM) based on Support Vector Machines (SVM) in Scale-free P2P networks. In particular, we demonstrate how to construct SVM classifier in real world P2P networks which exhibit inherently skewed distribution of node links and eventually data. The proposed method propagates the most influential instances of SVM models to the vast majority of scarcely connected peers in a controlled way that improves their local classification accuracy and, at the same time, keeps the communication cost low throughout the network. Besides using benchmark Machine Learning data sets for extensive experimental evaluations, we have evaluated the proposed method particularly for music genre classification to exhibit its performance in a real application scenario. Additionally, performance analysis is carried out with respect to centralized approaches, data replication in P2P networks and cost accuracy trade-off. TRedSVM outperforms baseline approaches of model propagation by improving the overall classification performance substantially at the cost of a tolerable increase in communication.
Read full abstract