Abstract

Phenotypes (i.e., symptoms and clinical signs) are essential for clinical diagnosis and research related to symptom science and precision health. As clinical observational manifestations of a disease, symptoms are clinically significant because they act as direct causes for patients to seek medical care and the primary indicators for clinicians to provide diagnosis/treatments. However, a comprehensive phenotypic knowledge base and high-quality symptom–gene associations are lacking. Therefore, a thorough understanding of the relationships between symptoms and other entities is urgently needed to support scientific research and clinical health care. In this paper, we constructed a systematic, large-scale, and high-quality symptom-gene associations network system named SympGAN (accessible at http://www.sympgan.org/). We provide access to the database with millions of associations between symptoms, genes, diseases, and drugs, as well as the system for users to search, analyze, knowledge inference, and present data visualization. We utilize state-of-the-art machine learning and deep learning algorithms as the backbone to form the final dataset. In addition, we utilize the RoBERTa-PubMed neural network for name entity recognition to assist in data screening. The knowledge graph is adopted to organize the relationships between different entities. We adopt ConvE, TuckER, and HypER methods for knowledge completion experiments to validate the quality of final knowledge graph triples. Based on the results, we provide online automatic knowledge inference interfaces. The system, SympGAN, has promising value for disease diagnosis, decision support in health care, precision health, and scientific research, as researchers and practitioners can easily access information about symptoms, diseases, targets, gene ontology, and drugs.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call