Enhancing bibliographic reference parsing with contrastive learning and prompt learning

Zhen Yin,Shenghua Wang

doi:10.1016/j.engappai.2024.108548

Abstract

Bibliographic references, typically comprising author names, journal titles, paper titles, and publication dates, play a vital role in academic research. Accurately identifying these structured pieces of information from references is a crucial step in developing intelligent bibliographic management systems. However, existing methods often rely on extensive high-quality training data. To mitigate the reliance on extensive training data, we propose a method that integrates prompt learning and contrastive learning for extracting structured information from bibliographic references, named CONT_Prompt_ParseRef. This approach aims to utilize contrastive learning to deepen the understanding of different metadata label types and employ prompt learning to provide specific guidelines for processing and recognition. We constructed a dataset comprising 12,000 samples, available in both Chinese and English versions. The experimental results on this bilingual dataset demonstrate the model's superior performance over existing techniques. Notably, CONT_Prompt_ParseRef shows remarkable robustness in low-resource environments, particularly in scenarios with limited training data, both contrastive and prompt learning play pivotal roles in label extraction from bibliographic references. The ablation study illustrates that omitting either component leads to a decline in performance, with contrastive learning being slightly more influential.

Full Text