A Supervised Term Weighting Scheme for Multi-class Text Categorization

Yiwei Gu,Xiaodong Gu

doi:10.1007/978-3-319-63315-2_38

Abstract

Most supervised term weighting (STW) schemes can only be applied to binary text classification tasks such as sentiment analysis (SA) rather than text classification with more than two categories. In this paper, we proposed a new supervised term weighting scheme for multi-class text categorization. The so-called inverse term entropy (ite) measures the distribution of different terms across all the categories according to the definition of entropy in information theory. We present experimental results obtained on the 20NewsGroup dataset with a popular classifier learning method, support vector machine (SVM). Our weighting scheme ite achieved the best result in classification accuracy compared with other existing methods. And ite has the most stable performance with the reduction of training samples as well. Furthermore, our method has a built-in property to prevent over-weighting in STW. Over-weighting is a newly proposed concept especially with supervised term weightings in our earlier work and re-introduced here. Caused by the improper singular terms and too large ratios between term weights, over-weighting could deprive the performance of text classification tasks.

Full Text