한국어 문서요약 및 음성합성 통합 프레임워크 구축

Takyoung Kim,Subin Kim,Pilsung Kang,Jina Kim,Hyeongwon Kang

doi:10.7232/jkiie.2022.48.1.080

Abstract

We propose an integrated text summarization and text-to-speech framework which summarizes Korean documents into a few sentences and reads them in a specific person’s voice. In our framework, a pre-trained text summarization model (KoBART) is fine-tuned with an additional news-oriented text summarization dataset. Then, the fine-tuned model is compressed by knowledge distillation (DistilKoBART) to improve computational efficiency. For text-to-speech, Tacotron 2 and Waveglow models are used. To generate a natural speech sample, we design a task-specific transliteration module that converts numeric or English expressions into Korean. The experimental results show that the proposed framework effectively summarizes long documents and provides a human-like synthesized voice. The proposed framework can provide convenience such as fast information delivery to busy modern people or effectively deliver information to users in special situations such as drivers and people with low vision.

Full Text