Abstract

This paper proposes the use of non-extensive entropy for text classification. Non-extensive entropy technique is used for text classification by estimating the conditional distribution of the class variable given the document. The underlying principle of non-extensive entropy is that without external knowledge, one should prefer distributions that are uniform. This paper proposes two models for text classification based on maximum entropy principle. The first model extends Shannon entropy into non-extensive entropy to simplify the form of classifier, the other one introduces high-level constraints into non-extensive model to impose constraints on the pairs of entities. Model with high-level constraints constructs relations between word pairs which builds semantic constraints, for the sake of advancing accuracy of text classification. Experiments on the 20 newsgroup set demonstrate the advantage of non-extensive model and non-extensive model with high-level constraints.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call