Abstract

With the development and popularization of the network, the network plays an important role in people’s daily life, work, and the dissemination of social hot information. Most of the online Chinese English neologisms are widely disseminated through the Internet platform and are known and used by people. New word recognition plays an important role in Chinese word segmentation and information retrieval. With the emergence of a large number of new words in China English, the lack of a China English neologism database has become a major obstacle to the study of China English. New word recognition is a major technical issue in building a corpus. To solve the problem that the internal cohesion of words in the existing point mutual information and low adjacency entropy of new word recognition algorithms, a new word recognition algorithm of China English is proposed. The algorithm also solves the problem of the single threshold setting of point mutual information of the threshold invalid phrases, and the low threshold of new word groups by using point mutual information to identify new words. The experimental results show that, under the same data and experimental environment, the method improves the accuracy, recall, and F values, which is effective and feasible for corpus construction.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.