Abstract

Relying on large pretrained language models such as Bidirectional Encoder Representations from Transformers (BERT) for encoding and adding a simple prediction layer has led to impressive performance in many clinical natural language processing (NLP) tasks. In this work, we present a novel extension to the Transformer architecture, by incorporating signature transform with the self-attention model. This architecture is added between embedding and prediction layers. Experiments on a new Swedish prescription data show the proposed architecture to be superior in two of the three information extraction tasks, comparing to baseline models. Finally, we evaluate two different embedding approaches between applying Multilingual BERT and translating the Swedish text to English then encode with a BERT model pretrained on clinical notes.

Highlights

  • Medical prescription notes written by clinicians about patients contains valuable information that the structured part of electronic health records (EHRs) does not have

  • More recently with the advent of Bidirectional Encoder Representations from Transformers (BERT) (Devlin et al, 2018), fine-tuning of general-domain language models has been widely adopted for many clinical natural language processing (NLP) tasks2

  • Considering our data is in Swedish we explore two different approaches for encoding the prescription notes: (1), Apply Multilingual BERT (M-BERT) (Devlin et al, 2018) directly to the Swedish text; (2), Translate the prescriptions to English as described in Section 4.1, and encode the translated text with ClinicalBERT (Huang et al, 2020)

Read more

Summary

Introduction

Medical prescription notes written by clinicians about patients contains valuable information that the structured part of electronic health records (EHRs) does not have. Several studies have finetuned general-domain language models such as BERT on in-domain clinical text (e.g. electronic health records) (Si et al, 2019; Peng et al, 2019; Alsentzer et al, 2019; Huang et al, 2020) for downstream clinical NLP tasks. More recently with the advent of Bidirectional Encoder Representations from Transformers (BERT) (Devlin et al, 2018), fine-tuning of general-domain language models has been widely adopted for many clinical NLP tasks. The attention function used in the Transformer model takes three input vectors: query (Q), key (K) and value (V ) It generates an output vector by computing the weighted sum of the values. The weights are computed by the dot products of the query and all keys, scaled and applied a softmax function

Objectives
Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.