A Document Level Measure for Text Categorization

V Mary Amala Bai ,D Manimegalai

doi:10.15866/irecos.v8i6.3328

Abstract

The term weighting scheme in text categorization is a vital step in automatic text categorization. Previous studies showed that term weighting techniques contribute more to the accuracy of classification than that of the classifier’s contribution for the same. So this work is concentrated on term weighting schemes for text categorization. A new supervised term weighting scheme for text categorization is proposed. The frequency of each term in a document is expressed as probability of the terms in a document. This gives the proportion of each term in a document. This information provides with a very good knowledge on the category of the document. The probability of a term in all the documents of a class when summed up leads to a very important variable which can be used for term weighting in classification. This is basically a document level variable because it is related to the probability of a term in a document. The related new measure is named as td (terms in a document). Its performance when evaluated with reuters-21578 and 20Newsgroup dataset showed interesting increase in performance compared to tf, idf and rf. Compared to rf, this measure works well for both svm (binary classifier) and centroid-based classifiers(multiclass classifier).

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

A Document Level Measure for Text Categorization

Abstract

Talk to us

Similar Papers

More From: International Review on Computers and Software

Lead the way for us

Journal: International Review on Computers and Software	Publication Date: Jun 30, 2013
Citations: 1

Similar Papers

Modified frequency-based term weighting schemes for text classification
Thabit Sabbah ... Hamido Fujita
Applied Soft Computing | VOL. 58
Thabit Sabbah, et. al.Thabit Sabbah ... Hamido Fujita
03 May 2017
Applied Soft Computing | VOL. 58

Forthcoming papers
-
International Journal of Forecasting | VOL. 4
--
01 Jan 1987
International Journal of Forecasting | VOL. 4

Text categorization via generalized discriminant analysis
Tao Li ... Mitsunori Ogihara
Information Processing and Management | VOL. 44
Tao Li, et. al.Tao Li ... Mitsunori Ogihara
09 Jun 2008
Information Processing and Management | VOL. 44

Efficient multi-way text categorization via generalized discriminant analysis
Tao Li ... Mitsunori Ogihara
-
Tao Li, et. al.Tao Li ... Mitsunori Ogihara
03 Nov 2003
03 Nov 2003

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A Document Level Measure for Text Categorization

Abstract

Talk to us

Similar Papers

More From: International Review on Computers and Software