DiseaSE: A biomedical text analytics system for disease symptom extraction and characterization.

Muhammad Abulaish,Md. Aslam Parwez,Jahiruddin Jahiruddin

doi:10.1016/j.jbi.2019.103324

Muhammad Abulaish, Md. Aslam Parwez + Show 1 more

Open Access

https://doi.org/10.1016/j.jbi.2019.103324

Copy DOI

Journal: Journal of Biomedical Informatics	Publication Date: Oct 31, 2019
Citations: 11	License type: elsevier-specific: oa user license

Affiliation: South Asian University, Jamia Millia Islamia

Abstract

Due to increasing volume and unstructured nature of the scientific literatures in biomedical domain, most of the information embedded within them remain untapped. This paper presents a biomedical text analytics system, DiseaSE (Disease Symptom Extraction), to identify and extract disease symptoms and their associations from biomedical text documents retrieved from the PubMed database. It implements various NLP and information extraction techniques to convert text documents into record-size information components that are represented as semantic triples and processed using TextRank and other ranking techniques to identify feasible disease symptoms. Eight different diseases, including dengue, malaria, cholera, diarrhoea, influenza, meningitis, leishmaniasis, and kala-azar are considered for experimental evaluation of the proposed DiseaSE system. On analysis, we found that the DiseaSE system is able to identify new symptoms that are even not catalogued on standard websites such as Center for Disease Control (CDC), World Health Organization (WHO), and National Health Survey (NHS). The proposed DiseaSE system also aims to compile generic associations between a disease and its symptoms, and presents a graph-theoretic analysis and visualization scheme to characterize disease at different levels of granularity. The identified disease symptoms and their associations could be useful to generate a biomedical knowledgebase (e.g., a disease ontology) for the development of e-health and disease surveillance systems.

Full Text