Document-level event argument extraction (EAE), a critical task for event knowledge acquisition, seeks to find event arguments beyond the sentence level. Previous approaches for this task adopt supervised learning, which suffers from the data scarcity issue, especially in low-resource situations. To address this challenge, we propose a new method for document-level EAE that addresses the data scarcity from two perspectives: self-augmentation and cross-domain joint training. On the one hand, our method can generate additional training samples from an existing dataset using pre-trained language models, considerably expanding the training set; on the other hand, our method can learn with out-of-domain datasets with varying event schema, using a cross-domain joint training framework. To address data quality problems and domain mismatches, we also provide a new noise filtering method based on a teacher-student framework. The extensive experiments demonstrate that our approach delivers state-of-the-art performance on benchmark datasets, and that it is especially effective in low-resource situations. • A new approach for resolving the data sparsity issue in document-level event argument extraction from two complementing viewpoints. • A self-augmentation technique that combines pre-trained language models and a label-conditioned pre-training procedure to retain word-label consistency. • A cross-domain joint training framework for knowledge transfer from datasets of varied granularity, task specifications, and the event schema description language. • A novel noise filtering method in teacher-student framework to mitigate the data quality issue. • State-of-the-art performance on standard datasets.
Read full abstract