Abstract
The diversity of online resources storing biological data in different formats provides a challenge for bioinformaticians to integrate and analyse their biological data. The semantic web provides a standard to facilitate knowledge integration using statements built as triples describing a relation between two objects. WikiPathways, an online collaborative pathway resource, is now available in the semantic web through a SPARQL endpoint at http://sparql.wikipathways.org. Having biological pathways in the semantic web allows rapid integration with data from other resources that contain information about elements present in pathways using SPARQL queries. In order to convert WikiPathways content into meaningful triples we developed two new vocabularies that capture the graphical representation and the pathway logic, respectively. Each gene, protein, and metabolite in a given pathway is defined with a standard set of identifiers to support linking to several other biological resources in the semantic web. WikiPathways triples were loaded into the Open PHACTS discovery platform and are available through its Web API (https://dev.openphacts.org/docs) to be used in various tools for drug development. We combined various semantic web resources with the newly converted WikiPathways content using a variety of SPARQL query types and third-party resources, such as the Open PHACTS API. The ability to use pathway information to form new links across diverse biological data highlights the utility of integrating WikiPathways in the semantic web.
Highlights
Pathway analysis and visualisation of data on pathways provide insights into the underlying biology of effects found in genomics, proteomics, and metabolomics experiments [1,2,3,4]
We combined various semantic web resources with the newly converted WikiPathways content using a variety of SPARQL query types and third-party resources, such as the Open PHACTS API
In order to contribute pathway knowledge to the semantic web, we have modeled the content of WikiPathways to form triple-based statements
Summary
Pathway analysis and visualisation of data on pathways provide insights into the underlying biology of effects found in genomics, proteomics, and metabolomics experiments [1,2,3,4]. Elements like genes, proteins, metabolites, and interactions are identified using common accession numbers from reference databases such as Entrez Gene [7], Ensembl [8], UniProt [9], HMDB [10], ChemSpider [11], PubChem [12] and ChEMBL [13]. Multiple databases can be referenced to annotate an element of the same semantic type, e.g. Ensembl and Entrez Gene to annotate gene information. In WikiPathways we use the open source software framework BridgeDb [14], to help resolve different identifiers representing the same (or related) entities. The semantic web provides an approach to define entities and their relationships. The Resource Description Framework (RDF) consists of two key components: statements and universal identifiers. The following triple defines the glucose molecule as being part of the glycolysis pathway:
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have