Abstract

Biomedical data classification tasks are very challenging because data is usually large, noised and imbalanced. Particularly the noise can reduce system performance in terms of classification accuracy, time in building a classifier and the size of the classifier. Accordingly, most existing learning algorithms have integrated various approaches to enhance their learning abilities from noisy environments, but the existence of noise can still introduce serious negative impacts. A more reasonable solution might be to employ some preprocessing mechanisms to handle noisy instances before a learner is formed. Therefore, we introduce a method called double learning to improve the classification performance of our model. As to the author’s knowledge, most of the previous works used the normal (noise free) instances for model construction (training) after the noise instances are isolated. This approach increases computational task on model construction for active learners and total computational time for passive learners. It also ignores minority data instance which leads to miss classification of instances from minority group as test cases. The main idea of this paper is to construct a model using noised instances. This approach minimizes the model construction time by reducing the number of instances and improves classification performance. Therefore, only the identified noised data are used for model construction instead of the normal (noise free) data. Since noised instances are used for model construction, the entire naive Bayesian working logic is reversed. This method is called complement naive Bayesian (CNB) which makes use of the idea of complement based learning to improve the accuracy performance. Finally, the performance of the proposed CNB is compared to naive Bayesian and some other classification algorithms with the single photon emission computed tomography, Indian liver patient dataset, Wilt and Tic-Tac-Toe endgame datasets. The experimental results demonstrated that the proposed approach has shown promising results in terms of computational time and accuracy performance on both balanced and imbalanced datasets used.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call