Abstract

Implicit discourse relation recognition is the performance bottleneck of discourse structure analysis. To alleviate the shortage of training data, previous methods usually use explicit discourse data, which are naturally labeled by connectives, as additional training data. However, it is often difficult for them to integrate large amounts of explicit discourse data because of the noise problem. In this paper, we propose a simple and effective method to leverage massive explicit discourse data. Specifically, we learn connective-based word embeddings (CBWE) by performing connective classification on explicit discourse data. The learned CBWE is capable of capturing discourse relationships between words, and can be used as pre-trained word embeddings for implicit discourse relation recognition. On both the English PDTB and Chinese CDTB data sets, using CBWE achieves significant improvements over baselines with general word embeddings, and better performance than baselines integrating explicit discourse data. By combining CBWE with a strong baseline, we achieve the state-of-the-art performance.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call