Analyzing Linguistic Features for Answer Re-Ranking of Why-Questions

Manvi Breja,Sanjay Kumar Jain

doi:10.4018/jcit.20220701.oa10

Manvi Breja, Sanjay Kumar Jain

Open Access

https://doi.org/10.4018/jcit.20220701.oa10

Copy DOI

Abstract

Why-type non-factoid questions are ambiguous and involve variations in their answers. A challenge in returning one appropriate answer to user requires the process of appropriate answer extraction, re-ranking and validation. There are cases where the need is to understand the meaning and context of a document rather than finding exact words involved in question. The paper addresses this problem by exploring lexico-syntactic, semantic and contextual query-dependent features, some of which are based on deep learning frameworks to depict the probability of answer candidate being relevant for the question. The features are weighted by the score returned by ensemble ExtraTreesClassifier according to features importance. An answer re-ranker model is implemented that finds the highest ranked answer comprising largest value of feature similarity between question and answer candidate and thus achieving 0.64 Mean Reciprocal Rank (MRR). Further, answer is validated by matching the answer type of answer candidate and returns the highest ranked answer candidate with matched answer type to a user.

Highlights

The advent of IBM’s Watson (IBM Watson, 2020) has shown remarkable results in answering opendomain questions
There are cases where the need is to understand the meaning and context of a document rather than finding exact words involved in question. The paper addresses this problem by exploring lexico-syntactic, semantic, and contextual query-dependent features, some of which are based on deep learning frameworks to depict the probability of answer candidate being relevant for the question
Various features covering lexical-syntactic, semantic and contextual similarities have been employed to find the relevancy of each answer candidate to a question

Summary

INTRODUCTION

The advent of IBM’s Watson (IBM Watson, 2020) has shown remarkable results in answering opendomain questions. Research in question answering domain has achieved high accuracy around 85% in answering factoid-type questions. Some of the work from Verberne et al (2010), Jansen and Surdeanu (2014), Fried and Jansen (2015), Oh et al (2012, 2013) has been successful in answering open-domain non-factoid questions whereas Tran and Niederee (2018) has investigated deep learning frameworks for answering insurance and financial domain non-factoid questions but still performance is lower than factoid QAS such as IBM Watson. An answer re-ranker is developed exploring the set of features based on similarity between question and answer candidates, weighted by feature importance scores. The method is able to achieve 0.64 Mean Reciprocal Rank (MRR) which significantly improves over other previous research works in why-type answer re-ranker.

BACKGROUND

Findings

CONCLUSION