Parallel Implementation of Classification Algorithms Based on Cloud Computing Environment

Lijuan Zhou,Hui Wang,Wenbo Wang

doi:10.11591/telkomnika.v10i5.1353

Abstract

As an important task of data mining, Classification has been received considerable attention in many applications, such as information retrieval, web searching, etc. The enlarging volumes of information emerging by the progress of technology and the growing individual needs of data mining, makes classifying of very large scale of data a challenging task. In order to deal with the problem, many researchers try to design efficient parallel classification algorithms. This paper introduces the classification algorithms and cloud computing briefly, based on it analyses the bad points of the present parallel classification algorithms, then addresses a new model of parallel classifying algorithms. And it mainly introduces a parallel Naive Bayes classification algorithm based on MapReduce, which is a simple yet powerful parallel programming technique. The experimental results demonstrate that the proposed algorithm improves the original algorithm performance, and it can process large datasets efficiently on commodity hardware.

Full Text