Abstract

We propose a new setting for question answering in which users can query the system using both natural language and direct interactions within a graphical user interface that displays multiple time series associated with an entity of interest. The user interacts with the interface in order to understand the entity's state and behavior, entailing sequences of actions and questions whose answers may depend on previous factual or navigational interactions. We describe a pipeline implementation where spoken questions are first transcribed into text which is then semantically parsed into logical forms that can be used to automatically extract the answer from the underlying database. The speech recognition module is implemented by adapting a pre-trained LSTM-based architecture to the user's speech, whereas for the semantic parsing component we introduce an LSTM-based encoder-decoder architecture that models context dependency through copying mechanisms and multiple levels of attention over inputs and previous outputs. When evaluated separately, with and without data augmentation, both models are shown to substantially outperform several strong baselines. Furthermore, the full pipeline evaluation shows only a small degradation in semantic parsing accuracy, demonstrating that the semantic parser is robust to mistakes in the speech recognition output. The new question answering paradigm proposed in this paper has the potential to improve the presentation and navigation of the large amounts of sensor data and life events that are generated in many areas of medicine.

Highlights

  • Introduction and motivationWearable sensors are being increasingly used in medicine to monitor important physiological parameters

  • We introduced a new question answering (QA) paradigm in which users can query a system using both natural language (NL) and direct interactions within a graphical user interface (GUI)

  • Using medical data acquired from patients with type 1 diabetes as a case study, we proposed a pipeline implementation where a speech recognizer transcribes spoken questions and commands from doctors into a semantically equivalent text that is semantically parsed into a logical form (LF)

Read more

Summary

Speech recognition

This section describes the speech recognition system, which is the first module in the proposed semantic parsing pipeline. The WER performance of the speech recognition system on the Amber dataset, with and without data augmentation, is shown in the top half of Table 2. Mouse clicks were automatically translated into LFs, whereas questions were parsed into LFs manually, to be used for training and evaluating the semantic parsing algorithms The Frank dataset contains LFs for 237 interactions, corresponding to 74 mouse clicks and 163 NL queries. The Amber dataset contains LFs for 504 interactions, corresponding to 330 mouse clicks and 174 NL queries. The simulator was used to generate 1000 interactions and their LFs: 312 mouse clicks and 688 NL queries

Baseline models for semantic parsing
Semantic parsing with multiple levels of attention and copying mechanism
Token-level supervised learning
Sequence-level reinforcement learning
Experimental evaluation
Semantic parsing pipeline evaluation
Findings
Conclusion and future work
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call