Abstract

The Centers for Medicare & Medicaid Services Incentive Programs promote meaningful use of electronic health records (EHRs), which, among many benefits, allow patients to receive electronic copies of their EHRs and thereby empower them to take a more active role in their health. In the United States, however, 17% population is Hispanic, of which 50% has limited English language skills. To help this population take advantage of their EHRs, we are developing English-Spanish machine translation (MT) systems for EHRs. In this study, we first built an English-Spanish parallel corpus and trained NoteAidSpanish, a statistical MT (SMT) system. Google Translator and Microsoft Bing Translator are two baseline MT systems. In addition, we evaluated hybrid MT systems that first replace medical jargon in EHR notes with lay terms and then translate the notes with SMT systems. Evaluation on a small set of EHR notes, our results show that Google Translator outperformed NoteAidSpanish. The hybrid SMT systems first map medical jargon to lay language. This step improved the translation. A fully implemented hybrid MT system is available at http://www.clinicalnotesaid.org. The English-Spanish parallel-aligned MedlinePlus corpus is available upon request.

Highlights

  • 3 MethodsMT has been an active research field for the past60–70 years

  • Translation patterns include phrase translations that translate input text by translating sequences of words at a time (Koehn et al, 2003; Och, 2002), re-ordering tendencies allowing swapping of words or phrases (Tillmann, 2004), hierarchical phrase translations with variables (Chiang, 2007), and syntax-based transformations (Galley et al, 2004)

  • We found that 17.9% of all terms in electronic health records (EHRs) notes do not appear in the MedlinePlus corpus, Bing NoteAid-BingSpanish

Read more

Summary

Methods

The best SMT systems are built from translation patterns that are learned automatically from parallel, humantranslated text corpora (Koehn, 2010). Several research groups built parallel corpora, trained SMT systems (Wu et al, 2011; Yepes et al, 2013). Zeng-Treitler et al (Zeng-Treitler et al, 2010) evaluated a general-purpose MT tool called Babel Fish to translate 213 EHR note sentences from English into Spanish, Chinese, Russian, and Korean and evaluated the comprehensibility and accuracy of the translation. We first built a domain-specific English-Spanish parallel aligned corpus and developed and evaluated SMT and hybrid machine translation (HMT) systems for translating EHR notes from English to Spanish.

English-Spanish Parallel Aligned Biomedical Corpora
MT Systems
Automatic Evaluation
Evaluation by a Domain Expert
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call