Abstract

In this paper, a study on the effect of different term weighting techniques on Arabic text complaints’ categorization is made. Farmers’ complaints written in unstructured and ungrammatical way are analyzed to be classified with respect to crop name. Initially, the complaints are preprocessed by removing stop words, correcting writing mistakes, and stemming. Some of the domain-specific special cases which may affect the classification performance are handled. Different term weighting schemes like TF, TF–IDF, and TF–ICF are used to form representative vectors for the complaints to train a classifier. Finally, the trained classifier is used to classify an unlabeled complaint. Moreover, a dataset contains more than 5300 Arabic complaints pertaining to 8 crops has been created. KNN classifier has been used for classification. The experiments show that there is stability difference between term weighting techniques. Further, a comparison analysis among the four feature selection techniques has been demonstrated.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call