Abstract

Weighting formula and feature selection are key preprocessing in text classifying and mining. We analyze the drawbacks of weighting formula based on inverse document frequency and present a novel feature weighting and selecting method based on variable precision rough set model. Inverse document frequency (IDF) doesn't take the classification information into account and the criterion based on IDF is not monotonous with the contribution that a feature makes to classification, which decreases the classifier's performance. The measure of classification quality based on variable rough set model can deal with complex classification. It measures the contribution a feature makes to classification. It is introduced as a criterion for feature selecting and weighting in text classification. We name it as TFACQ. The experimental results show that the weighting formula and feature selection based on TFACQ have greatly improved the performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.