Large-Scale Distributed Sparse Class-Imbalance Learning

Chandresh Kumar Maurya,Durga Toshniwal

doi:10.1016/j.ins.2018.05.004

Abstract

Class-imbalance learning is a classic problem in data mining and machine learning community. In class-imbalance learning, the idea is to learn the model so that it performs equally well on all the classes. Most of the work in literature so far have tackled this problem either in a centralized way or the work is limited to a particular domain such as intrusion detection. In the present paper, we propose to solve the class-imbalance learning problem on large-scale sparse data in a distributed setting. More specifically, we partition the data across examples and distribute each chunk of the data to different processing nodes. Each node runs a local copy of FISTA-like algorithm which is a distributed implementation of the prox-linear algorithm for cost-sensitive learning. We show the efficacy of the proposed approach on benchmark and real-world data sets and compare the performance with the state-of-the-art techniques in the literature.

Full Text