Abstract

The increasing use of Semantic Web technologies in the life sciences, in particular the use of the Resource Description Framework (RDF) and the RDF query language SPARQL, opens the path for novel integrative analyses, combining information from multiple sources. However, analyzing evolutionary data in RDF is not trivial, due to the steep learning curve required to understand both the data models adopted by different RDF data sources, as well as the SPARQL query language. In this article, we provide a hands-on introduction to querying evolutionary data across multiple sources that publish orthology information in RDF, namely: The Orthologous MAtrix (OMA), the European Bioinformatics Institute (EBI) RDF platform, the Database of Orthologous Groups (OrthoDB) and the Microbial Genome Database (MBGD). We present four protocols in increasing order of complexity. In these protocols, we demonstrate through SPARQL queries how to retrieve pairwise orthologs, homologous groups, and hierarchical orthologous groups. Finally, we show how orthology information in different sources can be compared, through the use of federated SPARQL queries.

Highlights

  • Gene classification based on evolutionary history is essential for many aspects of comparative and functional genomics - reviewed in (Gabaldón & Koonin, 2013); (Glover et al, 2019)

  • Data models we provide a brief introduction to the data models of the orthology databases considered in this article, in order to facilitate the understanding of the SPARQL queries presented in the Protocol Section

  • Protocols we provide four protocols to (i) retrieve pairwise orthologs through SPARQL queries from European Bioinformatics Institute (EBI), Orthologous MAtrix (OMA), Microbial Genome Database (MBGD), as well as (ii) homologous groups from OMA, MBGD and OrthoDB (iii) restrict the search to a given taxonomic level (iv) perform meta-analyses across multiple data sources providing orthology information, as well as aggregations using the entire data available in a given source

Read more

Summary

METHOD ARTICLE

A hands-on introduction to querying evolutionary relationships across multiple data sources using SPARQL [version 1; peer review: 1 approved, 2 approved with reservations]. Ana Claudia Sima, Christophe Dessimoz 2-6, Kurt Stockinger, Monique Zahn-Zabal 2,3, Tarcisio Mendes de Farias 2-4,7.

Introduction
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call