Abstract

The Category Discrimination Method (CDM) is a new machine learning algo rithm designed specifically for text categorization. The motivation is there are sta tistical problems associated with natural language text when it is applied as input to existing machine learning algorithms (too much noise, too many features, skewed distribution). The bases of the CDM are research results about the way that humans learn categories and concepts vis-à-vis contrasting concepts. The essential formula is cue validity borrowed from cognitive psychology, and used to select from all possible single word-based features the best predictors of a, given category. The, hypothesis that CDM’s performance. will exceed two non-domain specific al gorithms, Bayesian classification and decision tree learners, is empirically tested.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call