Abstract

BackgroundDue to the rapid progress of next-generation sequencing (NGS) facilities, an explosion of human whole genome data will become available in the coming years. These data can be used to optimize and to increase the resolution of the phylogenetic Y chromosomal tree. Moreover, the exponential growth of known Y chromosomal lineages will require an automatic determination of the phylogenetic position of an individual based on whole genome SNP calling data and an up to date Y chromosomal tree.ResultsWe present an automated approach, ‘AMY-tree’, which is able to determine the phylogenetic position of a Y chromosome using a whole genome SNP profile, independently from the NGS platform and SNP calling program, whereby mistakes in the SNP calling or phylogenetic Y chromosomal tree are taken into account. Moreover, AMY-tree indicates ambiguities within the present phylogenetic tree and points out new Y-SNPs which may be phylogenetically relevant. The AMY-tree software package was validated successfully on 118 whole genome SNP profiles of 109 males with different origins. Moreover, support was found for an unknown recurrent mutation, wrong reported mutation conversions and a large amount of new interesting Y-SNPs.ConclusionsTherefore, AMY-tree is a useful tool to determine the Y lineage of a sample based on SNP calling, to identify Y-SNPs with yet unknown phylogenetic position and to optimize the Y chromosomal phylogenetic tree in the future. AMY-tree will not add lineages to the existing phylogenetic tree of the Y-chromosome but it is the first step to analyse whole genome SNP profiles in a phylogenetic framework.

Highlights

  • Due to the rapid progress of next-generation sequencing (NGS) facilities, an explosion of human whole genome data will become available in the coming years

  • This expectation is based on the thousands of unknown Y-SNPs which are observed in recent whole genome studies like the Irish genome project [12], on the presence of an abundant number of polytomies in the phylogenetic tree of several haplogroups [11], and on the many recent publications with new Y chromosomal lineages [13,14]

  • The SNPs were called using the Quality Y-SNP allele calling Sufficient Y-SNP calling quality was found for all the high sequence coverage data of Complete Genomics, the medium sequence coverage data of the Irish Genome project and two medium coverage samples of the Khoisan and Bantu genomes project

Read more

Summary

Introduction

Due to the rapid progress of next-generation sequencing (NGS) facilities, an explosion of human whole genome data will become available in the coming years These data can be used to optimize and to increase the resolution of the phylogenetic Y chromosomal tree. Numerous described sub-haplogroups are still clearly paraphyletic within network analyses based on Y chromosomal Short Tandem Repeats (Y-STRs), for example sub-haplogroup G-P303* and J-M410* [15,16]) Based on these Y-STR networks, it is clear that it is still not possible to distinct several phylogenetic groups using Y-SNPs which may be relevant for several disciplines and applications of the Y chromosomal tree

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.