Abstract

Information Retrieval System is an effective process that helps a user to trace relevant information by Natural Language Processing (NLP). In this research paper, we have presented present an algorithmic Information Retrieval System(BIRS) based on information and the system is significant mathematically and statistically. This paper is demonstrated by two algorithms for finding out the lemmatization of Bengali words such as Trie and Dictionary Based Search by Removing Affix (DBSRA) as well as compared with Edit Distance for the exact lemmatization. We have presented the Bengali Anaphora resolution system using the Hobbs’ algorithm to get the correct expression of information. As the actionsof questions answering algorithms, the TF-IDF and Cosine Similarity are developed to find out the accurate answer from the documents. In this study, we have introduced a Bengali Language Toolkit (BLTK) and Bengali Language Expression (BRE) that make the easiest implication of our task. We have also developed Bengali root word’s corpus, synonym word’s corpus, stop word’s corpus and gathered 672 articles from the popular Bengali newspapers ‘The Daily Prothom Alo’ which is our inserted information. For testing this system, we have created 19335 questions from the introduced information and got 97.22% accurate answer.

Highlights

  • Information Retrieval (IR) refers to retrieve information from a collection of sources based on relevant query

  • We introduced a Bengali Information Retrieval Systems (BIRS) based on Bengali Natural Language Processing (BNLP)

  • For the implication of Bengali Informative Retrieval System (BIRS), we mainly described five types of corpus

Read more

Summary

INTRODUCTION

Information Retrieval (IR) refers to retrieve information from a collection of sources based on relevant query. A huge number of information are produced by newspapers, social networking sites and different kinds of websites Due to these large collections of digital documents in the web or local machine, finding the desired information is a tedious process. Finding relevant information based on query, has some challenges such as word mismatch that is a sentence can be made in different ways, their meaning is same but structure is different and a question can be formulated in different ways utilizing synonymuos words It is very challenging and difficult task to retrieve the desired information. BM25 ranking algorithm works well in different tasks [7] More advanced methods such as the Relevance-Based Language Models (or Relevance Models for short, RM) are the best-performing text retrieval ranking techniques [8 ]. Our main objective is to retrieve relevant information within a short time with great accuracy

RELATED WORK
PROPOSED WORK
Category
Corpus
Pre-Processing
Anaphora
Tokenization
Cleaning
Stop Words Removing
Lemmatization for Bangla Language
Synonyms Words Processing
TF-IDF
Cosine Similarity
Experimental Tools
Final Result
Findings
CONCLUSION AND FUTURE WORK
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.