Abstract

In this paper, a Chinese message sensitive words filtering system applied in an instant messaging environment is proposed. Firstly, the message sentence is segmented, and the segmentation result is corrected by using the association algorithm based on information entropy and point mutual information. The traditional DFA algorithm is used to construct the dictionary tree for sensitive word recognition, which effectively improves the recognition speed. Secondly, on the basis of the completion of the recognition, the pre-trained word vector model is used to match the words in the sensitive words list and the word segmentation results, and the words with higher similarity with the sensitive words are added to the sensitive words list to achieve the expansion and improvement of the sensitive thesaurus.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call