Abstract

Many of the machine learning algorithms are based on an assumption of attribute independency and often used in domains where the assumption doesn’t hold true. Naive Bayesian (NB) classifier makes assumption that all the features are conditionally independent given the class labels; In this paper, attribute dependencies were analyzed using Chi-Square test and the Principal Component Analysis (PCA) was carried out on the whole dataset to get a set of features. We have also applied PCA on the independent attributes of the data with a view to get a more compressed set of features that may lead to reliable accuracy for NB Classifier. The performance of the classifier was experimented for the combined approach as well as individual Chi-Square and PCA only approach with variation in dataset sizes. It was found that, for the used dataset, reduced dimensionality of the dataset according to Chi-Square independency test as well as the combined approach has come out with much better performance than PCA only approach, but considering the time, the combined approach is better. Cite this Article Biprodip Pal, Sadia Zaman, Md. Abu Hasan et al. Chi-Square Statistic and Principal Component Analysis Based Compressed Feature Selection Approach for Naive Bayesian Classifier. Journal of Artificial Intelligence Research & Advances. 2015; 2(2): 16–23p.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call