Abstract

The category discrimination method (CDM) is a new learning algorithm designed for text categorization. The motivation is that there are statistical problems associated with natural language text when it is applied as input to existing machine learning algorithms (too much noise, too many features, skewed distribution). The bases of the CDM are research results about the way that humans learn categories and concepts vis-a-vis contrasting concepts. The essential formula is cue validity borrowed from cognitive psychology, and used to select from all possible single word-based features the 'best' predictors of a given category. The hypothesis that CDM's performance exceeds two non-domain specific algorithms, Bayesian classification and decision tree learners, is empirically tested.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call