Abstract

Finding the semantically accurate answer is one of the key challenges in advanced searching. In contrast to keyword-based searching, the meaning of a question or query is important here and answers are ranked according to relevance. It is very natural that there is almost no common word between the question sentence and the answer sentence. In this paper, an approach is described to find out the semantically relevant answers in the Bengali dataset. In the first part of the algorithm, a set of statistical parameters like frequency, index, part-of-speech (POS) is matched between a question and the probable answers. In the second phase, entropy and similarity are calculated in different modules. Finally, a sense score is generated to rank the answers. The algorithm is tested on a repository containing a total of 275,000 sentences. This Bengali repository is a product of Technology Development for Indian Languages (TDIL) project sponsored by Govt. of India and provided by the Language Research Unit of Indian Statistical Institute, Kolkata. The shallow parser, developed by the LTRC group of IIIT Hyderabad is used for POS tagging. The actual answer is ranked as 1st in 82.3% cases. The actual answer is ranked within 1st to 5th in 90.0% cases. The accuracy of the system is coming as 97.32% and precision of the system is coming as 98.14% using confusion matrix. The challenges and pitfalls of the work are reported at last in this paper.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.