Abstract
AbstractAn English word is a sound or a combination of sounds that we make with the vocal organs; words are representative, they represent some meaning, and they help humans communicate; words are part of the human language communication system. In short, words are the smallest and meaningful language units that humans can use independently. This paper aims to study the English neologism corpus based on decision tree algorithm. Aiming at the problems of low internal cohesion of words in the new word recognition algorithm of point mutual information and adjacency entropy, many high threshold invalid phrases and low threshold new phrases that exist in the single threshold setting of point mutual information, an improved multi-word point is proposed. A new word extraction algorithm for English of the year based on mutual information and adjacency entropy. In the preprocessing stage, according to the characteristics of English new words, it is further filtered, and the point mutual information is expanded into multi-word point mutual information, and the new words are extracted by setting double thresholds and adjacency entropy. In the recognition algorithm, this paper regards it as a classification problem to solve, analyzes English new words, and uses cosine similarity and path distance similarity to quantify word form and word meaning features. For phrase pairs containing acronyms, this paper An algorithm for identifying acronyms by rules is proposed, and a decision tree suitable for English new word recognition is constructed based on the features of word form and word meaning. Experiments show that the data of the algorithm in this paper is better than the traditional algorithm, the P value reaches 33.8%, and the accuracy rate is greatly improved. Compared with the word sense feature recognition algorithm, the method in this paper has a 4% improvement in accuracy.KeywordsDecision Tree AlgorithmSynonym IdentificationNew WordsNew Word Corpus
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.