Abstract

BackgroundThe amount of data generated from genome-wide association studies (GWAS) has grown rapidly, but considerations for GWAS phenotype data reuse and interchange have not kept pace. This impacts on the work of GWAS Central – a free and open access resource for the advanced querying and comparison of summary-level genetic association data. The benefits of employing ontologies for standardising and structuring data are widely accepted. The complex spectrum of observed human phenotypes (and traits), and the requirement for cross-species phenotype comparisons, calls for reflection on the most appropriate solution for the organisation of human phenotype data. The Semantic Web provides standards for the possibility of further integration of GWAS data and the ability to contribute to the web of Linked Data.ResultsA pragmatic consideration when applying phenotype ontologies to GWAS data is the ability to retrieve all data, at the most granular level possible, from querying a single ontology graph. We found the Medical Subject Headings (MeSH) terminology suitable for describing all traits (diseases and medical signs and symptoms) at various levels of granularity and the Human Phenotype Ontology (HPO) most suitable for describing phenotypic abnormalities (medical signs and symptoms) at the most granular level. Diseases within MeSH are mapped to HPO to infer the phenotypic abnormalities associated with diseases. Building on the rich semantic phenotype annotation layer, we are able to make cross-species phenotype comparisons and publish a core subset of GWAS data as RDF nanopublications.ConclusionsWe present a methodology for applying phenotype annotations to a comprehensive genome-wide association dataset and for ensuring compatibility with the Semantic Web. The annotations are used to assist with cross-species genotype and phenotype comparisons. However, further processing and deconstructions of terms may be required to facilitate automatic phenotype comparisons. The provision of GWAS nanopublications enables a new dimension for exploring GWAS data, by way of intrinsic links to related data resources within the Linked Data web. The value of such annotation and integration will grow as more biomedical resources adopt the standards of the Semantic Web.

Highlights

  • The amount of data generated from genome-wide association studies (GWAS) has grown rapidly, but considerations for GWAS phenotype data reuse and interchange have not kept pace

  • Our ambition to find a single ontology capable of describing the broad spectrum of GWAS phenotypes was pragmatically driven by a requirement to have a single ontology to query the entire database against

  • The output includes three annotations from GWAS Central, three annotations from EuroPhenome as a result of the high-throughput phenotyping of a Baz1b knockout mouse line, and 28 annotations from Mouse Genome Database (MGD) derived from published and other sources (Table 2). Manual inspection of these results shows that both GWAS Central and EuroPhenome annotations relate to lipid phenotypes

Read more

Summary

Introduction

The amount of data generated from genome-wide association studies (GWAS) has grown rapidly, but considerations for GWAS phenotype data reuse and interchange have not kept pace. This impacts on the work of GWAS Central – a free and open access resource for the advanced querying and comparison of summary-level genetic association data. In recent years the amount of data generated from genome-wide association studies (GWAS) has increased rapidly. Gwascentral.org] (established in 2007, named HGVbaseG2P [3]) is a comprehensive central collection of genetic association data with a focus on advanced tools to integrate, search and compare summary-level data sets. The modular architecture of GWAS Central allows the infrastructure to be extended for use with different types of data, and it is anticipated that through future support from the BioSHaRE project [http://www.bioshare.eu], GWAS Central will be extended to integrate exome and next-generation sequencing data

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call