A feature selection bayesian approach for extracting classification rules with a clustering genetic algorithm

Estevam R Hruschka,Eduardo R Hruschka,Nelson F F Ebecken

doi:10.1080/713827176

Abstract

This work describes a method that combines a Bayesian feature selection approach with a clustering genetic algorithm to get classification rules in data-mining applications. A Bayesian network is generated from a data set and the Markov blanket of the class variable is applied to the feature subset selection task. The general rule extraction method is simple and consists of employing the clustering process in the examples of each class separately. In this way, clusters of similar examples are found for each class. These clusters can be viewed as subclasses and can, consequently, be modeled into logical rules. In this context, the problem of finding the optimal number of classification rules can be viewed as the problem of finding the best number of clusters. The Clustering Genetic Algorithm can find the best clustering in a data set, according to the Average Silhouette Width criterion, and it was applied to extract classification rules. The proposed methodology is illustrated by means of simulations in three data sets that are benchmarks for data-mining methods--Wisconsin Breast Cancer, Mushroom, and Congressional Voting Records. The rules extracted with all the attributes are compared to those extracted with the features belonging to the Markov blanket and the obtained results show that the proposed method is very promising.

Full Text