Abstract

The task of machine reading comprehension (MRC) has evolved from answering simple questions from well-edited text to answering real questions from users out of web data. In the real-world setting, full-body text from multiple relevant documents in the top search results are provided as context for questions from user queries, including not only questions with a single, short, and factual answer, but also questions about reasons, procedures, and opinions. In this case, multiple answers could be equally valid for a single question and each answer may occur multiple times in the context, which should be taken into consideration when we build MRC system. We propose a multi-answer multi-task framework, in which different loss functions are used for multiple reference answers. Minimum Risk Training is applied to solve the multi-occurrence problem of a single answer. Combined with a simple heuristic passage extraction strategy for overlong documents, our model increases the ROUGE-L score on the DuReader dataset from 44.18, the previous state-of-the-art, to 51.09.

Highlights

  • Machine reading comprehension (MRC) or question answering (QA) has been a long-standing goal in Natural Language Processing

  • As for POS tag, we use a POS tagger trained on the Chinese Treebank (CTB) data to tag each word in questions and passages in the DuReader dataset. 64-dimension POS tag embeddings are trained on this data using one-layer BiLSTM model

  • We focus on real-world machine reading comprehension

Read more

Summary

Introduction

Machine reading comprehension (MRC) or question answering (QA) has been a long-standing goal in Natural Language Processing. We propose three different kinds of multi-answer loss functions and compare their performance through experiment. Another problem is the multiple occurrences of the same answer. As rich context is provided for a single question, the same answer could occur more than one time in different passages, or even at different places of the same passage. In this case, using only one gold span for the answer could be problematic, as the model is forced to choose one span over others that contain the same content. We experiment with various alternatives on the DuReader dataset and show that our model out-performs other competing systems and increases the state-of-the-art ROUGE-L score by about 7 points

Related Work
Passage Extraction
Match Layer
Answer prediction with multi-answer
Passage selection with multi-answer
Minimum Risk Training
Experiment
Dataset and Evaluation Metrics
Word and POS Tag Embedding
Training and Parameters
Single-Answer Baseline
Different loss functions with multi-answer
Multi-task Loss and Minimum Risk Training
Comparison with State-of-the-art
Further Analysis
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call