Abstract

Cross-lingual text summarization is a highly desired service for overseas report editing tasks and is formulated in a distributed application to facilitate the cooperation of editors. The multilanguage pre-trained language model (MPLM) can generate high-quality cross-lingual text summaries with simple fine-tuning. However, the MPLM does not adapt to complex variations, like the word order and tense in different languages. When the model performs on these languages with separate syntactic structures and vocabulary morphologies, it will lead to the low-level quality of the cross-lingual summary. The matter worsens when the cross-lingual summarization datasets are low-resource. We use a knowledge distillation framework for the cross-lingual summarization task to address the above issues. By learning the monolingual teacher model, the cross-lingual student model can effectively capture the differences between languages. Since the teacher and student models generate summaries in two languages, their representations lie on different vector spaces. In order to construct representation relationships across languages, we further propose a similarity metric, which is based on bidirectional semantic alignment, to map different language representations to the same space. In order to improve the quality of cross-lingual summaries further, we use contrastive learning to make the student model focus on the differentials among languages. Contrastive learning can enhance the ability of the similarity metric for bidirectional semantic alignment. Our experiments show that our approach is competitive in low-resource scenarios on cross-language summarization datasets in pairs of distant languages.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.