Abstract
BackgroundRecent studies generating complete human sequences from Asian, African and European subgroups have revealed population-specific variation and disease susceptibility loci. Here, choosing a DNA sample from a population of interest due to its relative geographical isolation and genetic impact on further populations, we extend the above studies through the generation of 11-fold coverage of the first Irish human genome sequence.ResultsUsing sequence data from a branch of the European ancestral tree as yet unsequenced, we identify variants that may be specific to this population. Through comparisons with HapMap and previous genetic association studies, we identified novel disease-associated variants, including a novel nonsense variant putatively associated with inflammatory bowel disease. We describe a novel method for improving SNP calling accuracy at low genome coverage using haplotype information. This analysis has implications for future re-sequencing studies and validates the imputation of Irish haplotypes using data from the current Human Genome Diversity Cell Line Panel (HGDP-CEPH). Finally, we identify gene duplication events as constituting significant targets of recent positive selection in the human lineage.ConclusionsOur findings show that there remains utility in generating whole genome sequences to illustrate both general principles and reveal specific instances of human biology. With increasing access to low cost sequencing we would predict that even armed with the resources of a small research group a number of similar initiatives geared towards answering specific biological questions will emerge.
Highlights
Recent studies generating complete human sequences from Asian, African and European subgroups have revealed population-specific variation and disease susceptibility loci
Using the SIFT program [24], we investigated whether those novel non-synonymous single nucleotide polymorphism (SNP) in putative linkage disequilibrium (LD) with risk markers were enriched with SNPs predicted to be deleterious, and we found an enrichment of deleterious SNPs as one would expect if an elevated number were conferring risk to the relevant disease
We provide a novel technique for SNP calling in human genome sequence using haplotype data and validate the imputation of Irish haplotypes using data from the current Human Genome Diversity Panel (HGDP-CEPH)
Summary
Recent studies generating complete human sequences from Asian, African and European subgroups have revealed population-specific variation and disease susceptibility loci. The two versions of the human genome published in 2001, while both seminal achievements, were mosaic renderings of a number of individual genomes It has been clear for some time that sequencing additional representative genomes would be needed for a more complete understanding of genomic variation and its relationship to human biology. Whole genome sequences have recently been generated from diverse human populations, and studies of genetic diversity at the population level have unveiled some interesting findings [8]. These data look to be dramatically extended with releases of data from the 1000 Genomes project [9]. Representation of Europe will come from European American samples from Utah and Italian, Spanish, British and Finnish samples
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.