Physicians’ letters are the optimal source of diagnoses for registries. However, most registries demand for diagnosis codes such as ICD-10. We herein describe an algorithm that infers ICD-10 codes from German ophthalmologic physicians’ letters. We assess the method in three German eye hospitals. Our algorithm is based on the nearest-neighbor method as well as on a large thesaurus for ICD-10 codes. This thesaurus was embedded into a Word2Vec space created from anonymized physicians’ reports of the first hospital. For evaluation, each of the three hospitals sent all diagnoses taken from 100 letters. The inferred ICD-10 codes were evaluated for correctness by the senders. A total of 3332 natural language terms had been sent in (812 hospital one, 1473 hospital two, 1047 hospital three). A total of 526 non-diagnoses were excluded upfront. 2806 ICD-10 codes were inferred (771 hospital one, 1226 hospital two, 809 hospital three). In the first hospital, 98% were fully correct and 99% correct at the level of the superordinate disease concept. The percentages in hospital two were 69% and 86%. The respective numbers for hospital three were 69% and 91%. Our simple method is capable of inferring ICD-10 codes for German natural language diagnoses, especially when the embedding space has been built with physicians’ letters from the same hospital. The method may yield sufficient accuracy for many tasks in the multi-centric setting and can easily be adapted to other languages/specialities.
Read full abstract