Abstract

The traditional TF-IDF algorithm is a common method that is used to measure feature weight in text categorization. However, the algorithm doesn't take the distribution of feature terms in inter-class and intra-class into consideration. Consequently, the algorithm can't effectively weigh the distribution proportion of feature items. In order to solve this problem, information entropy in inter-class and intra-class which describes the distribution of feature terms was used to revise TF-IDF weight. Compared with traditional TF-IDF algorithm, the results of simulation experiment have demonstrated that the improved TF-DDF algorithm can get better classification results.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call