Objectives/Goals: Rare disease patients often face lengthy delays in receiving accurate diagnoses or experience misdiagnoses due to a lack of available information. The NCATS Rare Disease Alert System (RDAS) is a public, comprehensive rare disease resource to collect and share accurate, up-to-date, and standardized data on rare diseases. Methods/Study Population: RDAS is composed of a frontend UI, Application Programming Interfaces, and backend Neo4j graph database. Each component of data collection, data annotation, data standardization, and data representation as steps were implemented during the process of each graph database creation. The UI allows users to search, browse, and subscribe to RDAS to receive the latest information and findings about their rare disease(s) of interest. The back-end data include four knowledge graphs built by integrating information from the NCATS Genetic and Rare Disease program, PubMed articles, clinical trials, and NIH grant funding. Ultimately, the integrative information pertinent to rare diseases from RDAS would advance rare diseases research. Results/Anticipated Results: Of 5001 rare diseases belonging to 32 distinct disease categories, we identified 1294 diseases that are mapped to 45,647 distinct, NIH-funded projects obtained from the NIH ExPORTER by implementing semantic annotation of project titles. To capture semantic relationships presenting among mapped research funding data, we defined a data model comprised of seven primary classes and corresponding object and data properties. A Neo4j knowledge graph based on this predefined data model has been developed, and we performed multiple case studies over this knowledge graph to demonstrate its use in directing and promoting rare disease research. Discussion/Significance of Impact: We developed an integrative knowledge graph with rare disease data and demonstrated its use as a source to identify and generate scientific evidence to support rare disease research. With the success of this study, we plan to implement advanced computation to analyze more funding related data and link to other types of data to perform further research.
Read full abstract