Abstract

We present set to ordered text, a natural language generation task applied to automatically generating discharge instructions from admission ICD (International Classification of Diseases) codes. This task differs from other natural language generation tasks in the following ways: (1) The input is a set of identifiable entities (ICD codes) where the relations between individual entity are not explicitly specified. (2) The output text is not a narrative description (e.g. news articles) composed from the input. Rather, inferences are made from the input (symptoms specified in ICD codes) to generate the output (instructions). (3) There is an optimal order in which each sentence (instruction) should appear in the output. Unlike most other tasks, neither the input (ICD codes) nor their corresponding symptoms appear in the output, so the ordering of the output instructions needs to be learned in an unsupervised fashion. Based on clinical intuition, we hypothesize that each instruction in the output is mapped to a subset of ICD codes specified in the input. We propose a neural architecture that jointly models (a) subset selection: choosing relevant subsets from a set of input entities; (b) content ordering: learning the order of instructions; and (c) text generation: representing the instructions corresponding to the selected subsets in natural language. In addition, we penalize redundancy during beam search to improve tractability for long text generation. Our model outperforms baseline models in BLEU scores and human evaluation. We plan to extend this work to other tasks such as recipe generation from ingredients.

Highlights

  • Proposed ApproachWe hypothesize that each discharge instruction in the output is mapped to a subset of ICD codes specified in the input

  • 1.1 Problem StatementMany healthcare applications exhibit a strong mapping between numerical or categorical infor-ICD1 ICD2 ICD3 ICD4 ICD5 ICD6 ICD7 Input ICD Set ICD1 ICD3 ICD7 ICD1 ICD3 ICD5ICD1 ICD3 ICD7 ICD Subsets 1

  • Inferences are made from the input (ICD codes, which represent diagnoses and clinical procedures) to generate the output

Read more

Summary

Proposed Approach

We hypothesize that each discharge instruction in the output is mapped to a subset of ICD codes specified in the input. Our proposed approach models the correlations between individual entities in the input set to choose the most relevant subsets, and learn to generate their corresponding textual outputs in the appropriate order. We incorporate explicit means for reducing redundancy during decoding. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pages 6165–6175, Hong Kong, China, November 3–7, 2019. C 2019 Association for Computational Linguistics the proposed approach by generating discharge instructions from ICD codes assigned during hospital admissions Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, pages 6165–6175, Hong Kong, China, November 3–7, 2019. c 2019 Association for Computational Linguistics the proposed approach by generating discharge instructions from ICD codes assigned during hospital admissions

Relation to Other Work
Neural Architecture
Content and Subset Selection
Content Ordering and Instruction Generation
Beam Search with Redundancy Penalization
Implementation Details
Corpus
Seq2Seq
Set2SingleSeq
Set2MultipleSeq
Evaluation I
Evaluation II
Evaluation III
Grammaticality
Informativeness
Qualitative Comparison Across Models
Variability in Groundtruth References
Natural Language Generation
Text Generation in Healthcare
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call