Abstract

Text classification is one of the important technologies in the field of text data mining. Feature selection, as a key step in processing text classification tasks, is used to process high-dimensional feature sets, which directly affects the final classification performance. At present, the most widely used text feature selection methods in academia are to calculate the importance of each feature for classification through an evaluation function, and then select the most important feature subsets that meet the quantitative requirements in turn. However, ignoring the correlation between the features and the effect of their mutual combination in this way may not guarantee the best classification effect. Therefore, this paper proposes a chaotic antlion feature selection algorithm (CAFSA) to solve this problem. The main contributions include: (1) Propose a chaotic antlion algorithm (CAA) based on quasi-opposition learning mechanism and chaos strategy, and compare it with the other four algorithms on 11 benchmark functions. The algorithm has achieved a higher convergence speed and the highest optimization accuracy. (2) Study the performance of CAFSA using CAA for feature selection when using different learning models, including decision tree, Naive Bayes, and SVM classifier. (3) The performance of CAFSA is compared with that of eight other feature selection methods on three Chinese datasets. The experimental results show that using CAFSA can reduce the number of features and improve the classification accuracy of the classifier, which has a better classification effect than other feature selection methods.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call