A MapReduce-Based k-Nearest Neighbor Approach for Big Data Classification

Jesus Maillo,Isaac Triguero,Francisco Herrera

doi:10.1109/trustcom.2015.577

Abstract

The k-Nearest Neighbor classifier is one of the most well known methods in data mining because of its effectiveness and simplicity. Due to its way of working, the application of this classifier may be restricted to problems with a certain number of examples, especially, when the runtime matters. However, the classification of large amounts of data is becoming a necessary task in a great number of real-world applications. This topic is known as big data classification, in which standard data mining techniques normally fail to tackle such volume of data. In this contribution we propose a MapReduce-based approach for k-Nearest neighbor classification. This model allows us to simultaneously classify large amounts of unseen cases (test examples) against a big (training) dataset. To do so, the map phase will determine the k-nearest neighbors in different splits of the data. Afterwards, the reduce stage will compute the definitive neighbors from the list obtained in the map phase. The designed model allows the k-Nearest neighbor classifier to scale to datasets of arbitrary size, just by simply adding more computing nodes if necessary. Moreover, this parallel implementation provides the exact classification rate as the original k-NN model. The conducted experiments, using a dataset with up to 1 million instances, show the promising scalability capabilities of the proposed approach.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A MapReduce-Based k-Nearest Neighbor Approach for Big Data Classification

Abstract

Talk to us

Similar Papers

Lead the way for us

Similar Papers

A MapReduce-Based k-Nearest Neighbor Approach for Big Data Classification
...
-
, et. al. ...
20 Aug 2015
20 Aug 2015

A Parallel Approach for Optimizing KNN Classification Algorithm in Big Data
Muna H Aljanabi ... Kadhim B S Aljanabi
Al-Salam Journal for Engineering and Technology | VOL. 2
Muna H Aljanabi, et. al.Muna H Aljanabi ... Kadhim B S Aljanabi
16 May 2023
Al-Salam Journal for Engineering and Technology | VOL. 2

Harmonisation de la prise en charge respiratoire des patients atteints de SLA en France
J Gonzalez-Bermejo ... T Perez
Revue des Maladies Respiratoires | VOL. 22
J Gonzalez-Bermejo, et. al.J Gonzalez-Bermejo ... T Perez
01 Feb 2005
Revue des Maladies Respiratoires | VOL. 22

KNN-IS: An Iterative Spark-based design of the k-Nearest Neighbors classifier for big data
Jesus Maillo ... Francisco Herrera
Knowledge-Based Systems | VOL. 117
Jesus Maillo, et. al.Jesus Maillo ... Francisco Herrera
14 Jun 2016
Knowledge-Based Systems | VOL. 117

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A MapReduce-Based k-Nearest Neighbor Approach for Big Data Classification

Abstract

Talk to us

Similar Papers