Research of machine learning method for specific information recognition on the Internet

Dequan Zheng Dequan Zheng,Hao Yu Hao Yu,Yi Hu Yi Hu,Tiejun Zhao Tiejun Zhao,Sheng Li Sheng Li

doi:10.1109/icmi.2002.1166998

Abstract

With the available resources on the Internet becoming plentiful, a large amount of harmful information is permeating in and has been seriously affecting people's normal work and living. Therefore, harmful data streams must be recognized and filtered out effectively. After analyzing some harmful contents in Internet information streams, we present a new method, which recognizes specific information by machine learning (ML). We extracted key information from a number of corpuses through the ML method to obtain the part of speech (POS) transfer-form for key information by learning from corpuses, which is based on the same pronunciation matching of key information. Furthermore, the testing value of key information will be obtained in a real corpus to examine the likelihood between matching rules from information streams and those learnt from corpuses through the average value of POS transfer probability of key information. Therefore, the testing value for the whole real data stream will be obtained The experiment proved that the method was efficient for recognizing certain Internet harmful information.

Full Text