Abstract

In this study we propose an automatic single document text summarization technique using Latent Semantic Analysis (LSA) and diversity constraint in combination. The proposed technique uses the query based sentence ranking. Here we are not considering the concept of IR (Information Retrieval) so we generate the query by using the TF-IDF(Term Frequency-Inverse Document Frequency). For producing the query vector, we identify the terms having the high IDF. We know that LSA utilizes the vectorial semantics to analyze the relationships between documents in a corpus or between sentences within a document and key terms they carry by producing a list of ideas interconnected to the documents and terms. LSA helps to represent the latent structure of documents. For selecting the sentences from the document Latent Semantic Indexing (LSI) is used. LSI helps to arrange the sentences with its score. Traditionally the highest score sentences have been chosen for summary but here we calculate the diversity between chosen sentences and produce the final summary as a good summary should have maximum level of diversity. The proposed technique is evaluated on OpinosisDataset1.0.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.