Abstract

Open-book question answering is a subset of question answering (QA) tasks where the system aims to find answers in a given set of documents (open-book) and common knowledge about a topic. This article proposes a solution for answering natural language questions from a corpus of Amazon Web Services (AWS) technical documents with no domain-specific labeled data (zero-shot). These questions have a yes–no–none answer and a text answer which can be short (a few words) or long (a few sentences). We present a two-step, retriever–extractor architecture in which a retriever finds the right documents and an extractor finds the answers in the retrieved documents. To test our solution, we are introducing a new dataset for open-book QA based on real customer questions on AWS technical documentation. In this paper, we conducted experiments on several information retrieval systems and extractive language models, attempting to find the yes–no–none answers and text answers in the same pass. Our custom-built extractor model is created from a pretrained language model and fine-tuned on the the Stanford Question Answering Dataset—SQuAD and Natural Questions datasets. We were able to achieve 42% F1 and 39% exact match score (EM) end-to-end with no domain-specific training.

Highlights

  • Question answering (QA) has been a major area of research in artificial intelligence and machine learning since the early days of computer science [1,2,3,4]

  • We were able to achieve 42% F1 and 39% exact match score (EM)

  • QA systems are especially useful when a user searches for specific information and does not have the time—or does not want—to peruse all available documentation related to their search to solve the problem at hand

Read more

Summary

Introduction

Question answering (QA) has been a major area of research in artificial intelligence and machine learning since the early days of computer science [1,2,3,4]. Open-book QA is defined as the task whereby a system (such as a computer software) answers natural language questions from a set of available documents (open-book). These questions can have yes–no–none answers, short answers, long answers, or any combination of the above. We did not train the system on our domainspecific documents or questions and answers, a technique called zero-shot learning [5]. The system should be able to perform with a variety of document types and questions and answers without training. We defined this approach as “zero-shot open-book QA”. The experiments are explained, and the results, along with limitations and steps, are presented

Related Work
QA Approaches
Our Inspirations
AWS Documentation Dataset
Questions and Answers
Annotation Process
SQuAD Datasets
Natural Questions Dataset
Approach
Whoosh
Amazon Kendra
Extractors
Extractor Model Data Processing
Extractor Model
Retriever Experiments
Extractor Experiments
Error Analysis
Exact Matches
Retriever Errors
Partial Answers
Table Extraction Errors
Wrong Predictions
Limitations and Future
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call