Abstract

There are many temporarily constructed dynamic words in Chinese sentences. Dynamic words are sentence building units that are not included in the general lexicon and are not suitable for further syntactic analysis. Automatic recognition and analysis of dynamic words in sentences play an important role in improving the efficiency and accuracy of Chinese automatic syntactic analysis. The existing researches on dynamic words mainly focus on the qualitative description of concepts and categories. There is no overall algorithm design and experimental exploration on automatic recognition of dynamic words. In the practice of automatic syntactic analysis, dynamic words are generally segmented, and the components are analyzed according to syntax, while the automatic recognition and analysis of dynamic words as a whole are ignored. In this study, the dynamic word is separated from syntactic analysis as the content of lexical analysis and recognized and analyzed as a whole. This paper uses the method of knowledge engineering to research and analyze dynamic words for Chinese automatic syntactic analysis based on sentence pattern structure, initially designs a knowledge representation method of dynamic words, secondly constructs the dynamic word structural mode knowledge base by annotating the dynamic words in the corpus of a certain scale of international Chinese textbooks, and finally explores the automatic recognition methods of dynamic words based on regular expressions, semantic category combinations and machine learning classification algorithms. The experimental results show that the three algorithms can cover the recognition of all types of dynamic words, and achieve relatively ideal accuracy and recall rate.

Highlights

  • There are many temporarily constructed words and intermediate state combinations between words and phrases in Chinese sentences [1], [2]

  • This paper focuses on the knowledge representation and automatic recognition of dynamic words in Chinese text for automatic syntactic analysis based on sentence pattern structure [13]

  • In this paper, based on the dynamic word structural mode knowledge base for Chinese information processing, we propose dynamic word recognition algorithms based on regular expressions, semantic category combinations and machine learning classification algorithms respectively

Read more

Summary

INTRODUCTION

There are many temporarily constructed words and intermediate state combinations between words and phrases in Chinese sentences [1], [2]. The organizational structure of this paper is as follows: Chapter 1: introduction, which introduces the research background and significance, research status, research content and innovation, as well as the organizational structure of the article; Chapter 2: sentence pattern structure syntactic parsing, which introduces the syntactic analysis in Chinese information processing and the formalization, syntax and morphology of sentence pattern structure; Chapter 3: knowledge representation of dynamic words, which introduces the knowledge representation method of the dynamic word structural mode, corpus annotation and construction of the structural mode knowledge base; Chapter 4: automatic recognition of dynamic words, which introduces the automatic recognition algorithms and related experiments based on regular expressions, semantic category combinations and machine learning classification algorithms; Chapter 5: conclusion, which summarizes the research and looks forward to the future work

SENTENCE PATTERN STRUCTURE SYNTACTIC PARSING
CONSTRUCTION OF THE STRUCTURAL MODE KNOWLEDGE BASE
AUTOMATIC RECOGNITION OF DYNAMIC WORDS
Findings
CONCLUSION
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.