Abstract
Automatic text summarization is an emerging field of research in Natural Language Processing. This work is a novel attempt to include a low-resource language to the domain of Automatic Text Summarization. We use supervised machine learning algorithms to perform single document extractive automatic text summarization on documents in a low-resource language, Konkani. In particular, we propose using language independent features to train supervised machine learning algorithms using a Konkani dataset, specifically devised for the experimentation using books on Konkani folktale literature. We approach the automatic text summarization task as a binary classification problem, and the algorithms, once trained, classify the sentences based on their relevance to generate a summary. Thereafter, the performance of popular linear and non-linear supervised machine learning algorithms is evaluated using K-fold cross-validation. The summary generated by the systems is compared with human-generated summaries to verify its effectiveness. The results show that the linear models exhibit better performance in comparison with the non-linear models; however, all the models could beat the baselines. The output produced by the proposed methodology generates promising summaries without the need for any language-specific domain knowledge.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.