Abstract

Due to the large amount of data published on the Internet, the tasks related to the automatic generation of summaries from unstructured sources have gained enormous popularity in recent years. For instance, its applications are media monitoring, newsletter generation, legal document analysis, virtual assistants that can summarize email overload, e-learning, or patent research among others. One popular approach for generating the summaries is extractive summarization, that extracts the most meaningful keywords in a document and presents them to the reader comprehensively. To the best of our knowledge, there is a lack of studies that have evaluated extractive text summarization techniques in Spanish, specially novel techniques based on state-of-the-art transformers. Consequently, we perform a benchmark of traditional and recent approaches for conducting text summarization with the Corpus-TER dataset, that consists in 240 Mexican-Spanish news articles. Our preliminary results suggest that word embeddings from Word2Vec achieves the best results based on ROUGE-1, BLEU and edit distance metrics.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.