Abstract

This paper presents a Conditional Random Field (CRF) method of identifying prepositional phrases (PP) in Chinese patent documents. By using the CRF model, the identification process can be recognized as sequence labelling issue. After analyzing the characteristics of PP chunks in large scale corpus, we design several essential and helpful features and feature templates for recognizing PP chunks, and then use a CRF toolkit to train the model to identify PPs. At last, some experiments are conducted to justify the effects of the model, both the precision and recall rates are over 92%, higher than the baseline, indicating the method is reasonable and effective.

Highlights

  • Prepositional phrases (PP), as a traditional important phrase type, are widely distributed in Chinese patent documents

  • We conducted experiments to justify the effects of the method, and the experimental results showed the proposed approaches can improve the performance of identifying Chinese prepositional phrases (PP) significantly

  • We manually extracted 1017 sentences containing PP chunks as the final test set from the developing set of patent MT subtask in the NTCIR-9 workshop2, which is composed of 2000 parallel Chinese-English sentences

Read more

Summary

Introduction

Prepositional phrases (PP), as a traditional important phrase type, are widely distributed in Chinese patent documents. According to (Li, et al, 2014), in 500 randomly extracted sample patent sentences, 226 sentences contained PP chunks, accounting for 45.2% of the sample. Compared with other Chinese domain texts, PP chunks in patent documents tend to have following more specific features. To begin with, they usually have more complex and longer structures with more words, they can be composed of prepositions (prep.) and noun phrases (NP), verb phrases (VP) or even clauses. Following is an example in patent texts: 该真空工具[PP1 通过[PP3 在控制器中]连接 这些网络环片段][PP2 为实验装置]提供一个 低温泵。(The vacuum tool can provide a pump for the experiment instrument by connecting the network ring parts in the controller.)

Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.