On The Combination of Feature and Instance Selection

Jerffeson Teixeira de Souza,Rafael Augusto Ferreira do Carmo,Gustavo Augusto Campos de Lim

doi:10.5772/9153

Abstract

In the last decades, huge amounts of data became omnipresent in diverse areas of knowledge, such as business, astronomy, biology, and so on. Machine Learning and Knowledge Discovery in Databases (KDD) are fields in Computer Science that focus on the task of transforming these data into useful knowledge. In (Fayyad et al., 1996), KDD is defined as “the nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data”. Feature and Instance Selection belong to the practice of data preparation (or pre-processing), which is a preliminary process that transforms raw data into a format that is convenient to the data mining (or machine learning) algorithm. Usually, data is stored in a table-like format: the columns of these tables are the attributes or features they describe the data and the rows, or lines, are the records or instances they are the examples of the concept stored in the data. Feature and Instance selection processes allow applications, such as classification or clusterization, to focus only on the important (or relevant) attributes and records to the specific concept that is in study. As important machine learning problems, Feature and Instance Selection have been studied systematically over the last decades, when several algorithms for solving them individually have been proposed. Such selection problems play a fundamental role in the pre-processing step of any learning task. By removing noise, irrelevant and redundant features and instances, and reducing the overall dimensionality of a dataset, feature and instance selection have been demonstrated to improve the performance of most machine learning algorithms, speed up the output of models and allow algorithms to deal with datasets whose sizes are gigantic. Even though the specialized literature have exhibited remarkable results in solving both the feature and instance selection problems individually, little work has been done to manage these solutions to work together in order to solve these related problems simultaneously or even understand the relationship between features and instances. This chapter initially discusses the feature and instance selection problems and their relevance to machine learning, giving an accurate definition of both problems. Next, it surveys different approaches for dealing with feature selection and instance selection separately and some works that tried to integrate the solutions for these two problems, 9

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

On The Combination of Feature and Instance Selection

Abstract

Talk to us

Similar Papers

Lead the way for us

Publication Date: Feb 1, 2010
Citations: 15	License type: cc-by-sa

Similar Papers

Genetic algorithms in feature and instance selection
Chih-Fong Tsai ... Chi-Yuan Chu
Knowledge-Based Systems | VOL. 39
Chih-Fong Tsai, et. al.Chih-Fong Tsai ... Chi-Yuan Chu
27 Nov 2012
Knowledge-Based Systems | VOL. 39

A Scalable Memetic Algorithm for Simultaneous Instance and Feature Selection
Nicolás García-Pedrajas ... Aida De Haro-García
Evolutionary Computation | VOL. 22
Nicolás García-Pedrajas, et. al.Nicolás García-Pedrajas ... Aida De Haro-García
08 Aug 2013
Evolutionary Computation | VOL. 22

A Differential Evolution Approach to Feature Selection and Instance Selection
Jiaheng Wang ... Bing Xue
-
Jiaheng Wang, et. al.Jiaheng Wang ... Bing Xue
01 Jan 2015
01 Jan 2015

Transpose-based Integrated Data Reduction Techniques for Speeding up Classifier Training
Reham M Alamro ... Abdou S Youssef
-
Reham M Alamro, et. al.Reham M Alamro ... Abdou S Youssef
01 Dec 2019
01 Dec 2019

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

On The Combination of Feature and Instance Selection

Abstract

Talk to us

Similar Papers