Abstract

Due to increasing volume and unstructured nature of the scientific literatures in biomedical domain, most of the information embedded within them remain untapped. This paper presents a biomedical text analytics system, DiseaSE (Disease Symptom Extraction), to identify and extract disease symptoms and their associations from biomedical text documents retrieved from the PubMed database. It implements various NLP and information extraction techniques to convert text documents into record-size information components that are represented as semantic triples and processed using TextRank and other ranking techniques to identify feasible disease symptoms. Eight different diseases, including dengue, malaria, cholera, diarrhoea, influenza, meningitis, leishmaniasis, and kala-azar are considered for experimental evaluation of the proposed DiseaSE system. On analysis, we found that the DiseaSE system is able to identify new symptoms that are even not catalogued on standard websites such as Center for Disease Control (CDC), World Health Organization (WHO), and National Health Survey (NHS). The proposed DiseaSE system also aims to compile generic associations between a disease and its symptoms, and presents a graph-theoretic analysis and visualization scheme to characterize disease at different levels of granularity. The identified disease symptoms and their associations could be useful to generate a biomedical knowledgebase (e.g., a disease ontology) for the development of e-health and disease surveillance systems.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call