CDM: an approach to learning in text categorization

J.L Goldberg

doi:10.1109/tai.1995.479592

Abstract

The category discrimination method (CDM) is a new learning algorithm designed for text categorization. The motivation is that there are statistical problems associated with natural language text when it is applied as input to existing machine learning algorithms (too much noise, too many features, skewed distribution). The bases of the CDM are research results about the way that humans learn categories and concepts vis-a-vis contrasting concepts. The essential formula is cue validity borrowed from cognitive psychology, and used to select from all possible single word-based features the 'best' predictors of a given category. The hypothesis that CDM's performance exceeds two non-domain specific algorithms, Bayesian classification and decision tree learners, is empirically tested.

Full Text