Abstract
Text summarization has been one of the key research areas in Natural Language Processing (NLP) for a while. The various methods to summarize one or more documents can be broadly classified into extractive and abstractive text summarization where the former involves selecting key parts in the document and embedding into the summary while balancing between salience and redundancy. The latter involves creating new sentences to provide a summary of the documents. Extractive summarization can further be done in a supervised manner with humans or an unsupervised manner without any human intervention. This paper provides the knowledge a few of the current methods to perform extractive text summarization where the input would be multi document sets. Multi document summarization can consider two types of document sets; a homogeneous set of documents which have a common topic or theme and a heterogeneous set where the main topic for the documents are unrelated but they contain some form information that is related to the summary. The first method uses sentence regression where they consider performing sentence ranking along with sentence relations followed by greedy selection process. The second is an unsupervised paragraph embedding method utilizing a density peaks clustering method. The third method proposes document-level reconstruction using a neural document model. The fourth method is a query focused, joint neural network based model with an attention mechanism. The fifth method concentrates on coherence by providing a graph-based model which does not require discourse analysis as a prerequisite. We also see a way to create a heterogeneous multi-documentcorpus along with the limitations of each of these methods.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.