Abstract

With the proliferation of harmful Internet content such as pornography, violence, and hate messages, effective content-filtering systems are essential. However, a non-trivial obstacle in good text filtering is the high dimensionality of the data. We introduced a hybrid method to select features more accurately using some feature selection method and rough set theory. We can select features firstly using one of feature selection methods, such as x/sup 2/ statistic, mutual information, information gain, and then further select features using rough set. Thus more accurate and less features are extracted. In experiments, we used UCI machine learning dataset as our dataset. We use naive Bayes model to evaluate our feature selection method, the result shows our method has high precision and high recall, and is very effective and efficient.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.