Abstract

Text classification is the task of assigning labels to unlabeled text data. Text classification has several applications like sentiment analysis, document classification, and fake news detection such as Machine learning (ML) methods have been used commonly in text classification in the last several years. The fundamental problem in ML is that these approaches heavily depend on feature selection methods. The models and feature selection methods used in this research. Several past types of research conclude that there is no uniform feature selection method that works well for all types of classifier tasks as well as Urdu is a resource-poor language. In this study, a proposed hybrid feature selection approach for Roman Urdu text not only reduces the dimension of the feature map but also increases the accuracy of ML models. Using 11000 and 20000 records have been used for Support Vector Classifier, Naive Base and Decision Tree which have given 80.81%, 72.94% and 76.78% respectively, among other tested methods. The best accuracy values achieved by each classifier and the hybrid features ChiSAE, CorrelationAE, and GainRAE. In future, text classification for better understanding of human being self-analysis as well as deep learning methods will be utilized for better authenticity.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call