Boundary-aware Dual Biaffine Model for Sequential Sentence Classification in Biomedical Documents.

Junwen Duan,Huai Guo,Han Jiang,Fei Guo,Jianxin Wang

doi:10.1109/tcbb.2024.3376566

Abstract

Assigning appropriate rhetorical roles, such as "background," "intervention," and "outcome," to sentences in biomedical documents can streamline the process for physicians to locate evidence and resources for medical treatment and decision-making. While sequence labeling and span-based methods are frequently employed for this task, the former disregards a document's semantic structure, resulting in a lack of semantic coherence across continuous sentences. Span-based approaches, on the other hand, either necessitate the enumeration of all potential spans, which can be time-consuming, or may lead to the misclassification of sentences over extended spans. Consequently, an approach is required that models the semantic structure of documents explicitly and captures boundary information to achieve precise and effective sentence labeling in biomedical documents. To address these challenges, we propose a new approach, the boundary-aware dual biaffine model, which explicitly models the semantic structure of documents and incorporates boundary information via a dual biaffine layer. We introduce a dynamic programming algorithm to minimize missing labels and overlapping predictions, and achieve globally optimal decoding results. We evaluate our approach on three benchmark datasets, namely PubMed 20k RCT, PubMed-PICO and NICTA-PIBOSO. The experimental results demonstrate that our approach outperforms strong baselines and achieves state-of-the-art performance on PubMed 20k RCT and PubMed-PICO. Additionally, our method also achieves competitive results on NICTA-PIBOSO. Availability: Our codes and data will be available at: https://github.com/CSU-NLP-Group/Sequential-Sentence-Classification.

Full Text