Abstract
BackgroundLarge-scale genome sequencing poses enormous problems to the logistics of laboratory work and data handling. When numerous fragments of different genomes are PCR amplified and sequenced in a laboratory, there is a high immanent risk of sample confusion. For genetic markers, such as mitochondrial DNA (mtDNA), which are free of natural recombination, single instances of sample mix-up involving different branches of the mtDNA phylogeny would give rise to reticulate patterns and should therefore be detectable.Methodology/Principal FindingsWe have developed a strategy for comparing new complete mtDNA genomes, one by one, to a current skeleton of the worldwide mtDNA phylogeny. The mutations distinguishing the reference sequence from a putative recombinant sequence can then be allocated to two or more different branches of this phylogenetic skeleton. Thus, one would search for two (or three) near-matches in the total mtDNA database that together best explain the variation seen in the recombinants. The evolutionary pathway from the mtDNA tree connecting this pair together with the recombinant then generate a grid-like median network, from which one can read off the exchanged segments.ConclusionsWe have applied this procedure to a large collection of complete human mtDNA sequences, where several recombinants could be distilled by our method. All these recombinant sequences were subsequently corrected by de novo experiments – fully concordant with the predictions from our data-analytical approach.
Highlights
With the progress of large-scale genome sequencing in recent years, researchers are beginning to explore the possibilities of detecting errors and improving the overall quality of sequencing results
We have applied this procedure to a large collection of complete human mitochondrial DNA (mtDNA) sequences, where several recombinants could be distilled by our method
The human genome project originally sought to attain an overall error rate of less than one error per 10,000 base pairs. If this error rate applied to the sequencing of the entire human mitochondrial genome comprising about 16,570 base pairs, the majority of complete mtDNA sequences in a database would carry one or more incorrect bases – which was typically attained by the earliest sequencing attempts of the past, but most recent clinical mtDNA studies do not fare better and sometimes much worse [3,4,5]
Summary
With the progress of large-scale genome sequencing in recent years, researchers are beginning to explore the possibilities of detecting errors and improving the overall quality of sequencing results. Stipulating that two complete human mtDNA sequences sampled in some geographic region could typically differ in approximately 30 bases, about 10% of the mismatches would be due to artefacts, under an error rate of 1:10,000 Such an amount of errors, would be far too high for most medical and forensic studies of human mtDNA. When numerous fragments of different genomes are PCR amplified and sequenced in a laboratory, there is a high immanent risk of sample confusion For genetic markers, such as mitochondrial DNA (mtDNA), which are free of natural recombination, single instances of sample mix-up involving different branches of the mtDNA phylogeny would give rise to reticulate patterns and should be detectable
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.