Two layer algorithm for data classification based on rough set and Bayesian network classifiers

Marzieh Mirzakhani,Amirmasoud Eftekhari Moghadam

doi:10.1109/ifsc.2013.6675587

Abstract

Data classification, especially text classification has been one of the key subjects in intelligent information processing due to the enormous growth of digital content available on-line. Owing to the high feature space dimensions in most of data types, reduction of feature space and improving classification accuracy is important and difficult problem. A rough set theory is a powerful tool to deal with uncertainty, so it is a good tool for feature reduction. Bayesian networks are also one of the most powerful tools in design of expert systems located in an uncertainty framework. In this paper, we proposed an algorithm for data classification that first, it uses rough set theory and conditional entropy for feature selection and then through rough membership degree concept, it classifies objects with high membership degree, certainly. For classification of other objects, we use Bayesian network classifiers for instance Tree Augmented Naive Bayes and general Bayesian classifier with two different search approaches. Finally, proposed algorithm is evaluated on Reuters-21578 collection and 4 UCI data sets.

Full Text