Abstract

We propose an integrated text summarization and text-to-speech framework which summarizes Korean documents into a few sentences and reads them in a specific person’s voice. In our framework, a pre-trained text summarization model (KoBART) is fine-tuned with an additional news-oriented text summarization dataset. Then, the fine-tuned model is compressed by knowledge distillation (DistilKoBART) to improve computational efficiency. For text-to-speech, Tacotron 2 and Waveglow models are used. To generate a natural speech sample, we design a task-specific transliteration module that converts numeric or English expressions into Korean. The experimental results show that the proposed framework effectively summarizes long documents and provides a human-like synthesized voice. The proposed framework can provide convenience such as fast information delivery to busy modern people or effectively deliver information to users in special situations such as drivers and people with low vision.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call