Background: Distributed graph databases are a promising method for storing and conducting complex pathway queries on large-scale drug knowledge graphs to support drug research. However, there is a research gap in evaluating drug knowledge graphs’ storage and query performance based on distributed graph databases. This study evaluates the feasibility and performance of distributed graph databases in managing large-scale drug knowledge graphs. Methods: First, a drug knowledge graph storage and query system is designed based on the Nebula Graph database. Second, the system’s writing and query performance is evaluated. Finally, two drug repurposing benchmarks are used to provide a more extensive and reliable assessment. Results: The performance of distributed graph databases surpasses that of single-machine databases, including data writing, regular queries, constrained queries, and concurrent queries. Additionally, the advantages of distributed graph databases in writing performance become more pronounced as the data volume increases. The query performance benefits of distributed graph databases also improve with the complexity of query tasks. The drug repurposing evaluation results show that 78.54% of the pathways are consistent with currently approved drug treatments according to repoDB. Additionally, 12 potential pathways for new drug indications are found to have literature support according to DrugRepoBank. Conclusions: The proposed system is able to construct, store, and query a large graph of multisource drug knowledge and provides reliable and explainable drug–disease paths for drug repurposing.
Read full abstract