Abstract

Separable words have important applications in many fields such as Chinese information processing, Chinese-English translation, teaching Chinese as a foreign language. There are about five thousand separable words distribute in the corpus of Chinese, and the word frequency is greater in the novel, so the study on identification of separable words is significant. This paper selects the higher discrete frequency of verb-object separable words as the object of the study, by examining the manifestation of extended components in different separable words and giving summary and detailed classification of the extended components on the large-scale corpus, a new approach is designed based on the words segmentation and the structure type of extended component. According to the experiments of identification mark to separable words of verb-object type, the average recall is 89.54% and the average precision is 87.43% in open test. The experimental results show that the method is effective.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.