Abstract

The article considers the problem of multilingual summarization generation for scientific documents. To solve this, we propose a method based on the summarization-translation approach. The method decomposes the original task into two separate tasks: monolingual document summarization and multilingual summarization. In the first task monolingual summary is generated for the document in the document language. In the second task resulting monolingual summary is translated into the language of interest. The paper analyses different abstractive and extractive models to choose the optimal one as a monolingual summarization model. The best model is selected based on the rouge metric, as well as on the newly proposed metrics. The multilingual summarization model uses Moses statistical machine translation model and post-processing based on the mT5 transformer model. The proposed system was tested on the Wikipedia dataset for 15 different languages. It is shown that the proposed system of models can generate multilingual summaries in 15 languages.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call