Abstract

In data mining applications, multi-label classification is highly required in many modern applications. Meanwhile, a useful data mining approach is the k-nearest neighbour join, which has high accuracy but time-consuming process. With recent explosion of big data, conventional serial KNN join based multi-label classification algorithm needs to spend a lot of time to handle high volumn of data. To address this problem, we first design a parallel MapReduce based KNN join algorithm for big data classification. We further implement the algorithm using Hadoop in a cluster with 9 vitual machines. Experiment results show that our MapReduce based KNN join exhibits much higher performance than the serial one. Several interesting phenomenon are observed from the experiment results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call