Abstract

BackgroundDetermining whether two DNA samples originate from the same individual is difficult when the amount of retrievable DNA is limited. This is often the case for ancient, historic, and forensic samples. The most widely used approaches rely on amplification of a defined panel of multi-allelic markers and comparison to similar data from other samples. When the amount retrievable DNA is low these approaches fail.ResultsWe describe a new method for assessing whether shotgun DNA sequence data from two samples are consistent with originating from the same or different individuals. Our approach makes use of the large catalogs of single nucleotide polymorphism (SNP) markers to maximize the chances of observing potentially discriminating alleles. We further reduce the amount of data required by taking advantage of patterns of linkage disequilibrium modeled by a reference panel of haplotypes to indirectly compare observations at pairs of linked SNPs. Using both coalescent simulations and real sequencing data from modern and ancient sources, we show that this approach is robust with respect to the reference panel and has power to detect positive identity from DNA libraries with less than 1 % random and non-overlapping genome coverage in each sample.ConclusionWe present a powerful new approach that can determine whether DNA from two samples originated from the same individual even when only minute quantities of DNA are recoverable from each.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-015-2241-6) contains supplementary material, which is available to authorized users.

Highlights

  • Determining whether two DNA samples originate from the same individual is difficult when the amount of retrievable DNA is limited

  • In each round of simulation, we generated two diploid individuals by drawing two haplotypes each from the simulations results. Alleles from these individuals were sampled at a rate of 0.02 to produce sets of allelic observations similar to what could be achieved in genome sequencing from libraries that represent 0.02 fold genome coverage

  • From 100 rounds of simulation, we found our method can consistently distinguish between two single allele observation sets that originate from the same or different individuals (Fig. 2a) when the reference panel and diploid individuals are drawn from the same population

Read more

Summary

Introduction

Determining whether two DNA samples originate from the same individual is difficult when the amount of retrievable DNA is limited This is often the case for ancient, historic, and forensic samples. Deep sequencing of wellpreserved samples can produce many-fold coverage of the complete nuclear genome of an individual [2,3,4], but more often samples yield only small amounts of endogenous DNA, i.e., less than one-fold genome coverage [5,6,7].

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.