Abstract

With the available resources on the Internet becoming plentiful, a large amount of harmful information is permeating in and has been seriously affecting people's normal work and living. Therefore, harmful data streams must be recognized and filtered out effectively. After analyzing some harmful contents in Internet information streams, we present a new method, which recognizes specific information by machine learning (ML). We extracted key information from a number of corpuses through the ML method to obtain the part of speech (POS) transfer-form for key information by learning from corpuses, which is based on the same pronunciation matching of key information. Furthermore, the testing value of key information will be obtained in a real corpus to examine the likelihood between matching rules from information streams and those learnt from corpuses through the average value of POS transfer probability of key information. Therefore, the testing value for the whole real data stream will be obtained The experiment proved that the method was efficient for recognizing certain Internet harmful information.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call