Abstract
Text classification reduces the time complexity and space complexity by dividing the complete task into the different classes. The main problem with text classification is a vast number of features extracted from the textual data. Pre-processed dataset have many features, some of which are not desirable and act only like noise. In this paper, a novel approach for optimal text classification based on nature-inspired algorithm and ensemble classifier is proposed. In the proposed model, feature selection was performed with Biogeography Based Optimization (BBO) algorithm along with ensemble classifiers (Bagging). The use of ensemble classifiers for classification delivers better performance for optimal text classification as compared to an individual classifier, and hence, improving the accuracy. Ensemble classifiers combines the weakness of individual classifiers. The individual classifiers are unable to improve the classification results when compared to ensemble classifier. The selected features, after feature selection using BBO algorithm, are classified into various classes using six machine learning classifier. The experimental results are computed on ten text classification datasets taken from UCI repository and one real-time dataset of an airlines. The four different measures namely; Accuracy, Precision, Recall and F- measure are used to validate performance of our model with ten-fold cross-validation. For feature selection process, a comparison is performed among state-of-the-art algorithms available in the literature. Results shows that BBO for feature selection outperforms the other similar nature-based optimization techniques. Our proposed approach of BBO with ensemble classifier is also compared with techniques proposed by other researchers and we analyzed the results quantitatively and qualitatively.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have