Hierarchically constructing micro (i.e., intra-sentence or inter-sentence) discourse structure trees using explicit boundaries (e.g., sentence and paragraph boundaries) has been proved to be an effective strategy. However, it is difficult to apply this strategy to document-level macro (i.e., inter-paragraph) discourse parsing, the more challenging task, due to the lack of explicit boundaries at the higher level. To alleviate this issue, we introduce a topic segmentation mechanism to detect implicit topic boundaries and then help the document-level macro discourse parser to construct better discourse trees hierarchically. In particular, our parser first splits a document into several sections using the topic boundaries that the topic segmentation detects. Then it builds a smaller and more accurate discourse sub-tree in each section and sequentially forms a whole tree for a document. The experimental results on both Chinese MCDTB and English RST-DT show that our proposed method outperforms the state-of-the-art baselines significantly.
Read full abstract