Abstract

Abstractive text summarization is the task of generating a summary that captures the main content of a text document. As a state-of-the-art method for abstractive summarization, the pointer-generator network produces more fluent summaries and solves two shortcomings of reproducing factual details inaccurately and phrase repetition. Though this network can generate Out-Of-Vocabulary (OOV) words, it cannot completely represent them in the context and may face the information loss problem. This paper aims to improve the quality of abstractive summarization with an extra pretrained layer of word embedding for the pointer-generator network. This mechanism helps to maintain the meaning of words in more various contexts. This assures that every word has its own representation, even though it does not exist in the vocabulary. We modify the network with the two latest word embedding mechanisms, i.e. Word2vec and Fasttext, to represent the semantic information of words more accurately. Some OOV words which are marked as unknown tokens now can have their right embeddings and be well considered in summary generation. The experiments on the CNN/Daily Mail corpus shows that the new mechanism outperforms the only pointer-generator network in all 3 ROUGE scores (R1, R2, RL).

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.