This paper proposes incorporating linguistic semantic information into discourse relation recognition and constructing a Semantic Augmented Chinese Discourse Corpus (SACA) comprising 9546 adversative complex sentences. In adversative complex sentences, we suggest a quadruple (P, Q, R, Qβ) representing internal semantic elements, where the semantic opposition between Q and Qβ forms the basis of the adversative relationship. P denotes the premise, and R represents the adversative reason. The overall annotation approach of this corpus follows the Penn Discourse Treebank (PDTB), except for the classification of senses. We combined insights from the Chinese Discourse Treebank (CDTB) and obtained eight sense categories for Chinese adversative complex sentences. Based on this corpus, we explore the relationship between sense classification and internal semantic elements within our newly proposed Chinese Adversative Discourse Relation Recognition (CADRR) task. Leveraging deep learning techniques, we constructed various classification models and the model that utilizes internal semantic element features, demonstrating their effectiveness and the applicability of our SACA corpus. Compared with pre-trained models, our model incorporates internal semantic element information to achieve state-of-the-art performance.
Read full abstract