Smoothing dense spaces for improved relation extraction between drugs and adverse reactions

Sara Santiso,Alicia Pérez,Arantza Casillas

doi:10.1016/j.ijmedinf.2019.05.009

Abstract

Background and objectiveThis work aims at extracting Adverse Drug Reactions (ADRs), i.e. a harm directly caused by a drug at normal doses, from Electronic Health Records (EHRs). The lack of readily available EHRs because of confidentiality issues and their lexical variability make the ADR extraction challenging. Furthermore, ADRs are rare events. Therefore, efficient representations against data sparsity are needed. MethodsEmbedding-based characterizations are able to group semantically related words. However, dense spaces suffer from data sparsity. We employed context-aware continuous representations to enhance the modelling of infrequent events through their context and we turned to simple smoothing techniques to increase the proximity between similar words (e.g. direction cosines, truncation, Principal Component Analysis (PCA) and clustering) in an attempt to cope with data sparsity. ResultsAn F-measure of 0.639 for the ADR classification was achieved, obtaining an improvement of approximately 0.300 in comparison with the results obtained by a word-based characterization. ConclusionThe embbeding-based representation together with the smoothing techniques increased the robustness of the ADR characterization. It was proven particularly appropriate to cope with lexical variability and data sparsity.

Full Text