Abstract

In this paper we analyze the contribution of semantic, syntactic and word similarity of document features in closed and open domain question answering. Semantic similarity is computed as the similarity of the action in the candidate sentence to the action asked in the question, measured using WordNet::Similarity on main verbs. The syntactic similarity feature measures the unifiability of a candidate's parse tree with the question's parse tree. It uses syntactic restrictions as well as lexical measures to compute the unifiability of critical syntactic participants in the parse trees. Finally, the word similarity of the document containing a candidate sentence is computed as the cosine of the angle between the question keywords vector and the document vector. Since the semantic feature is more reliable on content verbs and syntactic similarity is suitable for questions with a subject- verb-object syntactic structure, we only consider questions with a main content verb in our analysis (non-copulative questions). This type comprise 70% of our closed domain and 33% of our open domain test questions. The combination of these three features achieves an MRR of 28% in our closed domain and 23% in open domain. Our analysis shows that the syntactic feature has a significant contribution in both open and closed domains. However, the path-based lch semantic similarity measure we used, only contributes in our closed domain probably because of less variation in the vocabulary and topic. Document IR score on the other hand, has more contribution in open domain, because query keywords are more discriminating in a large document set with a vast vocabulary range.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.