Abstract

This work describes a method that combines a Bayesian feature selection approach with a clustering genetic algorithm to get classification rules in data-mining applications. A Bayesian network is generated from a data set and the Markov blanket of the class variable is applied to the feature subset selection task. The general rule extraction method is simple and consists of employing the clustering process in the examples of each class separately. In this way, clusters of similar examples are found for each class. These clusters can be viewed as subclasses and can, consequently, be modeled into logical rules. In this context, the problem of finding the optimal number of classification rules can be viewed as the problem of finding the best number of clusters. The Clustering Genetic Algorithm can find the best clustering in a data set, according to the Average Silhouette Width criterion, and it was applied to extract classification rules. The proposed methodology is illustrated by means of simulations in three data sets that are benchmarks for data-mining methods--Wisconsin Breast Cancer, Mushroom, and Congressional Voting Records. The rules extracted with all the attributes are compared to those extracted with the features belonging to the Markov blanket and the obtained results show that the proposed method is very promising.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.