An Approach to Enhance Text Categorization through Shrinkage in a Hierarchy of Modules

Takudzwa Fadziso

doi:10.18034/abcjar.v8i2.562

Abstract

Most organizations carried out their activities by design and develop a large volume of programmed documents as an essential element of their external and internal performance. When documents are well-known in a large volume of subject matter classification, the classifications are frequently prepared in order. Newsgroup and yahoo databases are two cases studied. This article indicates that the precision of a naïve Bayes text classifier can be importantly enhanced by taking benefit of a hierarchy of categories. A statistical approach known as shrinkage was adopted that levels variable prediction of a data-sparse child with its blood relation in direction to acquire more vigorous variable predictions. The test results on 3 real-time datasets from Yahoo, UseNet, and shared webpages display enhanced performance with about 29% error reduction over the customarily flat classifier.

Full Text