Abstract

Most supervised term weighting (STW) schemes can only be applied to binary text classification tasks such as sentiment analysis (SA) rather than text classification with more than two categories. In this paper, we proposed a new supervised term weighting scheme for multi-class text categorization. The so-called inverse term entropy (ite) measures the distribution of different terms across all the categories according to the definition of entropy in information theory. We present experimental results obtained on the 20NewsGroup dataset with a popular classifier learning method, support vector machine (SVM). Our weighting scheme ite achieved the best result in classification accuracy compared with other existing methods. And ite has the most stable performance with the reduction of training samples as well. Furthermore, our method has a built-in property to prevent over-weighting in STW. Over-weighting is a newly proposed concept especially with supervised term weightings in our earlier work and re-introduced here. Caused by the improper singular terms and too large ratios between term weights, over-weighting could deprive the performance of text classification tasks.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call