Abstract
The task of machine reading comprehension (MRC) has evolved from answering simple questions from well-edited text to answering real questions from users out of web data. In the real-world setting, full-body text from multiple relevant documents in the top search results are provided as context for questions from user queries, including not only questions with a single, short, and factual answer, but also questions about reasons, procedures, and opinions. In this case, multiple answers could be equally valid for a single question and each answer may occur multiple times in the context, which should be taken into consideration when we build MRC system. We propose a multi-answer multi-task framework, in which different loss functions are used for multiple reference answers. Minimum Risk Training is applied to solve the multi-occurrence problem of a single answer. Combined with a simple heuristic passage extraction strategy for overlong documents, our model increases the ROUGE-L score on the DuReader dataset from 44.18, the previous state-of-the-art, to 51.09.
Highlights
Machine reading comprehension (MRC) or question answering (QA) has been a long-standing goal in Natural Language Processing
As for POS tag, we use a POS tagger trained on the Chinese Treebank (CTB) data to tag each word in questions and passages in the DuReader dataset. 64-dimension POS tag embeddings are trained on this data using one-layer BiLSTM model
We focus on real-world machine reading comprehension
Summary
Machine reading comprehension (MRC) or question answering (QA) has been a long-standing goal in Natural Language Processing. We propose three different kinds of multi-answer loss functions and compare their performance through experiment. Another problem is the multiple occurrences of the same answer. As rich context is provided for a single question, the same answer could occur more than one time in different passages, or even at different places of the same passage. In this case, using only one gold span for the answer could be problematic, as the model is forced to choose one span over others that contain the same content. We experiment with various alternatives on the DuReader dataset and show that our model out-performs other competing systems and increases the state-of-the-art ROUGE-L score by about 7 points
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have