Abstract

We present algorithm for keyword extraction from a Bengali document. In natural language processing (NLP), keyword extraction is the automated process to identify a set of terms that represent the information discussed in a document. A lot of research works have been done for keyword extraction in resource rich languages. Some of those works followed supervised approach using specific corpus whereas the latest techniques use unsupervised approach. Keyword extraction procedure already achieved state-of-the-art performance for the resource rich languages. Only a few works have been done on the keyword extraction for documents in Bengali but none of them could achieve > 70% accuracy. In this article, we discuss the methods for extracting Bengali keywords from a specific document collection following unsupervised learning approach. Generally, Bengali keyword extraction is difficult in terms of words parsing, stemming, excluding stop words etc. The accuracy of those modules also impact the performance of the keyword extraction procedure. However, we obtained 87% accuracy to identify the correct Bengali keywords from a document. The procedure we have discussed for keyword extraction can also be applied to any language; but here we have provided all of our experimental results specifically for Bengali language.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.