Abstract

With the deep penetration of the Internet, uncontrolled flood of information has become one of the most serious problems to Internet users. Harmful contents about pornography, violence and other illegal messages, etc have posed serious influence to the whole society, especially to the young people. In this paper, a novel web text filter based on rough set and Bayesian theory is proposed to analysis text content of web pages to filter harmful pages. Some of current feature selection methods such as Inverse document frequency (IDF) does not take the classification information into account. To avoid this shortcoming rough set is used to reduce original feature terms. Meanwhile, a novel coefficient weighted method based on rough set is proposed and introduced into Bayesian formula, which will greatly improve filtering performance. In the final experiment, this paper compared the novel method with other weighted methods applied in Bayesian formula, such as Tf, IDF and TFIDF. The results demonstrate that this novel filter works efficiently.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call