Abstract

Weighting formula and feature selection are key preprocessing in text classifying and mining. We analyze the drawbacks of weighting formula based on inverse document frequency and present a novel feature weighting and selecting method based on variable precision rough set model. Inverse document frequency (IDF) doesn't take the classification information into account and the criterion based on IDF is not monotonous with the contribution that a feature makes to classification, which decreases the classifier's performance. The measure of classification quality based on variable rough set model can deal with complex classification. It measures the contribution a feature makes to classification. It is introduced as a criterion for feature selecting and weighting in text classification. We name it as TFACQ. The experimental results show that the weighting formula and feature selection based on TFACQ have greatly improved the performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call