ENGINES: exploring single nucleotide variation in entire human genomes

Jorge Amigo,Antonio Salas,Christopher Phillips

doi:10.1186/1471-2105-12-105

Jorge Amigo, Antonio Salas + Show 1 more

Open Access

https://doi.org/10.1186/1471-2105-12-105

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Apr 19, 2011
Citations: 45	License type: CC BY 2.0

Affiliation: University of Santiago de Compostela

Abstract

BackgroundNext generation ultra-sequencing technologies are starting to produce extensive quantities of data from entire human genome or exome sequences, and therefore new software is needed to present and analyse this vast amount of information. The 1000 Genomes project has recently released raw data for 629 complete genomes representing several human populations through their Phase I interim analysis and, although there are certain public tools available that allow exploration of these genomes, to date there is no tool that permits comprehensive population analysis of the variation catalogued by such data.DescriptionWe have developed a genetic variant site explorer able to retrieve data for Single Nucleotide Variation (SNVs), population by population, from entire genomes without compromising future scalability and agility. ENGINES (ENtire Genome INterface for Exploring SNVs) uses data from the 1000 Genomes Phase I to demonstrate its capacity to handle large amounts of genetic variation (>7.3 billion genotypes and 28 million SNVs), as well as deriving summary statistics of interest for medical and population genetics applications. The whole dataset is pre-processed and summarized into a data mart accessible through a web interface. The query system allows the combination and comparison of each available population sample, while searching by rs-number list, chromosome region, or genes of interest. Frequency and FST filters are available to further refine queries, while results can be visually compared with other large-scale Single Nucleotide Polymorphism (SNP) repositories such as HapMap or Perlegen.ConclusionsENGINES is capable of accessing large-scale variation data repositories in a fast and comprehensive manner. It allows quick browsing of whole genome variation, while providing statistical information for each variant site such as allele frequency, heterozygosity or FST values for genetic differentiation. Access to the data mart generating scripts and to the web interface is granted from http://spsmart.cesga.es/engines.php

Highlights

Generation ultra-sequencing technologies are starting to produce extensive quantities of data from entire human genome or exome sequences, and new software is needed to present and analyse this vast amount of information
Access to the data mart generating scripts and to the web interface is granted from http://spsmart.cesga.es/engines.php
Construction and content We have developed a human genome variant site browser: ENGINES dedicated, in the first instance, to the flexible and thorough analysis of the Single Nucleotide Variation (SNV) catalogue generated from the 1000 Genomes Phase I interim analysis, it will subsequently integrate new whole genome sequence data from other sources as this becomes publicly available

Summary

Introduction

Generation ultra-sequencing technologies are starting to produce extensive quantities of data from entire human genome or exome sequences, and new software is needed to present and analyse this vast amount of information. The first pilot study (Pilot 1) assessed the strategy of sharing data across samples on whole genome sequencing results with relatively low coverage (2-4x) It presented 179 genomes from the four different population panels previously characterised by HapMap (CEU, CHB, JPT and YRI) describing ~14 million variants. The recent release of an interim analysis of the project’s Phase I has considerably enriched the data available: 629 entire genomes from 12 different populations, describing ~28 million variants These populations are: individuals of African ancestry in Southwest USA (ASW), Utah residents with N & W European ancestry from the CEPH collection (CEU), Han Chinese in Beijing, China (CHB), Han Chinese South (CHS), Finnish in Finland (FIN), British in England and Scotland (GBR), Japanese in Tokyo, Japan (JPT), Luhya in Webuye, Kenya (LWK), individuals of Mexican ancestry in Los Angeles, California (MXL), Puerto Ricans in Puerto Rico (PUR), Tuscans in Italy (TSI), and Yoruba in Ibadan, Nigeria (YRI)

Methods

Findings

Discussion

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

ENGINES: exploring single nucleotide variation in entire human genomes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

SNPs, Haplotypes, and Cancer: Applications in Molecular Epidemiology
Timothy R Rebbeck ... Stephen J Chanock
Cancer Epidemiology, Biomarkers & Prevention | VOL. 13
Timothy R Rebbeck, et. al.Timothy R Rebbeck ... Stephen J Chanock
01 May 2004
Cancer Epidemiology, Biomarkers & Prevention | VOL. 13

High‐Throughput Single Nucleotide Polymorphisms Genotyping Technologies
Ku Chee Seng ... Kasiman Katherine
-
Ku Chee Seng, et. al.Ku Chee Seng ... Kasiman Katherine
15 Sep 2009
15 Sep 2009

Characterising Somatic Mutations in Cancer Genome by Means of Next‐generation Sequencing
Mei Ling Chong ... Chee Seng Ku
-
Mei Ling Chong, et. al.Mei Ling Chong ... Chee Seng Ku
15 Feb 2012
15 Feb 2012

Gene-guided therapy for catheter-ablation of atrial fibrillation: are we there yet?
Henry Huang ... Dawood Darbar
Journal of Interventional Cardiac Electrophysiology | VOL. 45
Henry Huang, et. al.Henry Huang ... Dawood Darbar
11 Dec 2015
Journal of Interventional Cardiac Electrophysiology | VOL. 45

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

ENGINES: exploring single nucleotide variation in entire human genomes

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: BMC Bioinformatics