Abstract

This paper describes our approach to automatically identify paired Discourse Connectives (DCs) in Chinese texts. Discourse Connectives (DCs) are terms that connect two text spans and signal the discourse relations between them. Most DCs consist of a consecutive words (eg. as a result); however paired DCs are composed of non-consecutive words that together signal the discourse relation (eg. on one hand … on the other hand). Although paired DCs are not common in English, they are very frequent in Chinese. The contribution of this paper in two-fold: First, we propose a methodology for the automatic identification of Chinese paired DCs. Second, we present a new corpus based on the Chinese Discourse Treebank (CDTB) [1] annotated with paired DCs. To identify paired DCs, we experimented with two main approaches: hypothesis testing and supervised machine learning. Although the hypothesis testing approaches led to lower than expected results, the simple machine learning models achieved F-scores between 72.5%–75.6% with no fine-tuning.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.