Evolutionary Feature Selection for Big Data Classification: A MapReduce Approach

Daniel Peralta,Sara Del Río,Francisco Herrera,Jose M Benitez,Isaac Triguero,Sergio Ramírez-Gallego

doi:10.1155/2015/246139

Daniel Peralta, Sara Del Río + Show 4 more

Open Access

https://doi.org/10.1155/2015/246139

Copy DOI

Abstract

Nowadays, many disciplines have to deal with big datasets that additionally involve a high number of features. Feature selection methods aim at eliminating noisy, redundant, or irrelevant features that may deteriorate the classification performance. However, traditional methods lack enough scalability to cope with datasets of millions of instances and extract successful results in a delimited time. This paper presents a feature selection algorithm based on evolutionary computation that uses the MapReduce paradigm to obtain subsets of features from big datasets. The algorithm decomposes the original dataset in blocks of instances to learn from them in the map phase; then, the reduce phase merges the obtained partial results into a final vector of feature weights, which allows a flexible application of the feature selection procedure using a threshold to determine the selected subset of features. The feature selection method is evaluated by using three well-known classifiers (SVM, Logistic Regression, and Naive Bayes) implemented within the Spark framework to address big data problems. In the experiments, datasets up to 67 millions of instances and up to 2000 attributes have been managed, showing that this is a suitable framework to perform evolutionary feature selection, improving both the classification accuracy and its runtime when dealing with big data problems.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Mathematical Problems in Engineering	Publication Date: Jan 1, 2015
Citations: 128	License type: CC BY 3.0

R Discovery Prime

R Discovery Prime

Evolutionary Feature Selection for Big Data Classification: A MapReduce Approach

Abstract

Talk to us

Similar Papers

More From: Mathematical Problems in Engineering

Lead the way for us

Similar Papers

A scalable approach to simultaneous evolutionary instance and feature selection
Nicolás García-Pedrajas ... Javier Pérez-Rodríguez
Information Sciences | VOL. 228
Nicolás García-Pedrajas, et. al.Nicolás García-Pedrajas ... Javier Pérez-Rodríguez
25 Oct 2012
Information Sciences | VOL. 228

Feature Selection, Online Feature Selection Techniques for Big Data Classification: - A Review
S Gayathri Devi ... M Sabrigiriraj
-
S Gayathri Devi, et. al.S Gayathri Devi ... M Sabrigiriraj
01 Mar 2018
01 Mar 2018

A Survey on Feature Selection Using FAST Approach to Reduce High Dimensional Data
R Munieswari ... S Saranya
international journal of engineering trends and technology | VOL. -
R Munieswari, et. al.R Munieswari ... S Saranya
25 Feb 2014
international journal of engineering trends and technology | VOL. -

A Preliminary Study of the Feasibility of Global Evolutionary Feature Selection for Big Datasets under Apache Spark
M Galar ... F Herrera
-
M Galar, et. al.M Galar ... F Herrera
01 Jul 2018
01 Jul 2018

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Evolutionary Feature Selection for Big Data Classification: A MapReduce Approach

Abstract

Talk to us

Similar Papers

More From: Mathematical Problems in Engineering