Enabling Smart Data: Noise filtering in Big Data classification

Diego García-Gil,Julián Luengo,Salvador García,Francisco Herrera

doi:10.1016/j.ins.2018.12.002

Diego García-Gil, Julián Luengo + Show 2 more

Open Access

https://doi.org/10.1016/j.ins.2018.12.002

Copy DOI

Abstract

In any knowledge discovery process the value of extracted knowledge is directly related to the quality of the data used. Big Data problems, generated by massive growth in the scale of data observed in recent years, also follow the same dictate. A common problem affecting data quality is the presence of noise, particularly in classification problems, where label noise refers to the incorrect labeling of training instances, and is known to be a very disruptive feature of data. However, in this Big Data era, the massive growth in the scale of the data poses a challenge to traditional proposals created to tackle noise, as they have difficulties coping with such a large amount of data. New algorithms need to be proposed to treat the noise in Big Data problems, providing high quality and clean data, also known as Smart Data. In this paper, two Big Data preprocessing approaches to remove noisy examples are proposed: an homogeneous ensemble and an heterogeneous ensemble filter, with special emphasis in their scalability and performance traits. The obtained results show that these proposals enable the practitioner to efficiently obtain a Smart Dataset from any Big Data classification problem.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Enabling Smart Data: Noise filtering in Big Data classification

Abstract

Talk to us

Similar Papers

More From: Information Sciences

Lead the way for us

Journal: Information Sciences	Publication Date: Dec 3, 2018
Citations: 118

Similar Papers

Imperfect Big Data
Julián Luengo ... Sergio Ramírez-Gallego
-
Julián Luengo, et. al.Julián Luengo ... Sergio Ramírez-Gallego
01 Jan 2020
01 Jan 2020

Redundancy and Complexity Metrics for Big Data Classification: Towards Smart Data
Jesus Maillo ... Isaac Triguero
IEEE Access | VOL. 8
Jesus Maillo, et. al.Jesus Maillo ... Isaac Triguero
01 Jan 2020
IEEE Access | VOL. 8

MRQAR: A generic MapReduce framework to discover quantitative association rules in big data problems
D Martín ... J.C Riquelme-Santos
Knowledge-Based Systems | VOL. 153
D Martín, et. al.D Martín ... J.C Riquelme-Santos
30 Apr 2018
Knowledge-Based Systems | VOL. 153

Chi-BD-DRF: Design of Scalable Fuzzy Classifiers for Big Data via A Dynamic Rule Filtering Approach
Fatemeh Aghaeipoor ... Mohammad Masoud Javidi
-
Fatemeh Aghaeipoor, et. al.Fatemeh Aghaeipoor ... Mohammad Masoud Javidi
01 Jul 2020
01 Jul 2020

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Enabling Smart Data: Noise filtering in Big Data classification

Abstract

Talk to us

Similar Papers

More From: Information Sciences