Abstract

Aiming at the limitation of incremental learning for the imbalanced massive data streams, this paper proposes the approach of the cost-sensitive incremental classification under the MapReduce framework for imbalanced massive data streams (CILCIDS). Firstly, this paper gives cost-sensitive concept drift detection for massive data stream under the MapReduce framework by counting the recession numbers of the pure and tolerance clusters. Secondly, we give Cost-sensitive SVM algorithm based on incremental learning. The new incremental samples can be divided into two parts, and only the samples against KKT conditions are used for the incremental learning. At last, the imbalanced massive data streams are divided under the MapReduce framework and are processed in parallel. The cost-sensitive incremental learning classification based on cloud computing platform is developed, and the weighted cost-sensitive ensemble classifier is constructed. The experiments show that the proposed incremental learning algorithm under the MapReduce framework is feasible and correct. CILCIDS has high performance by comparing to the other classification algorithms for imbalanced data streams, and can be effective to deal with unbalanced data stream with concept drift.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call