N-ary relation extraction models are required to be trained on large amount of high-quality data, but it is challenging to obtain such data in effect; thus, models are forced to rely on a limited amount of low-quality labeled data. This paper proposes an active learning method for cross-sentence n-ary relation extraction, addressing the following question: “How can we train an n-ary relation extraction model incrementally without a large amount of high-quality data?” To answer this research question, we introduce a schema-aware sampling strategy that selects informative samples in unlabeled dataset that will be used for model training. This method exploits structural relatedness between a relation and its entities to generate the context embeddings of inferred relations. Using the similarity between the clusters of context embeddings and target samples, we detect a set of informative samples in unlabeled dataset. Moreover, the paper proposes a balanced incremental learning method updating the extraction model without bias with only a small computational cost at each training iteration. Experimental results on benchmark datasets are demonstrated to confirm the validity and effectiveness of the proposed method.
Read full abstract