Abstract

Because of the growing mass of data and the requirements of data mining's individuation, the traditional centralized data mining method can't adapt to this kind of demand. Cloud computing provided a cheap solution for massive data storage, analysis and handling. In order to achieve the purpose of parallel data mining in cloud environment, an improved algorithm based on the traditional Naive Bayes has been proposed in this paper. First, proposing the designing ideas of the improved algorithm in MapReduce programming model. Then using the actual data to test the algorithm. The experimental result validated that the new algorithm has higher performance and better scalability.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call