Abstract

BackgroundThe study of ancient DNA is hampered by degradation, resulting in short DNA fragments. Advances in laboratory methods have made it possible to retrieve short DNA fragments, thereby improving access to DNA preserved in highly degraded, ancient material. However, such material contains large amounts of microbial contamination in addition to DNA fragments from the ancient organism. The resulting mixture of sequences constitutes a challenge for computational analysis, since microbial sequences are hard to distinguish from the ancient sequences of interest, especially when they are short.ResultsHere, we develop a method to quantify spurious alignments based on the presence or absence of rare variants. We find that spurious alignments are enriched for mismatches and insertion/deletion differences and lack substitution patterns typical of ancient DNA. The impact of spurious alignments can be reduced by filtering on these features and by imposing a sample-specific minimum length cutoff. We apply this approach to sequences from four ~ 430,000-year-old Sima de los Huesos hominin remains, which contain particularly short DNA fragments, and increase the amount of usable sequence data by 17–150%. This allows us to place a third specimen from the site on the Neandertal lineage.ConclusionsOur method maximizes the sequence data amenable to genetic analysis from highly degraded ancient material and avoids pitfalls that are associated with the analysis of ultra-short DNA sequences.

Highlights

  • The study of ancient DNA is hampered by degradation, resulting in short DNA fragments

  • Estimating the fraction of spurious alignments To allow for fine-scale estimates of the fraction of spurious alignments in small datasets, we changed ~ 18 million interspersed bases in the human reference genome

  • The alignment parameters used here and in other studies [14, 20] limit the fraction of allowed mismatches per alignment to approximately 10%, resulting for spuriously aligned sequences in a predicted ~ 90% match probability for the mutated state and a ~ 3.3% probability for matching either of the remaining three states (Fig. 1a). To test whether these predictions hold, we generated sequences from DNA isolated from the blood sample of a healthy human individual that was fragmented heavily to mimic the size distribution of ancient DNA

Read more

Summary

Introduction

The study of ancient DNA is hampered by degradation, resulting in short DNA fragments. Advances in laboratory methods have made it possible to retrieve short DNA fragments, thereby improving access to DNA preserved in highly degraded, ancient material Such material contains large amounts of microbial contamination in addition to DNA fragments from the ancient organism. Laboratory methods have been developed that aim at retrieving these fragments from ancient biological material [3,4,5] and transforming them efficiently into library molecules for high-throughput sequencing [6] These developments have enabled researchers to study DNA sequences from increasingly older samples. One notable example are four remains from Sima de los Huesos in Spain that constitute, with an age of over 400,000 years, the by far oldest hominin material to date that yielded ancient DNA sequences [7, 8]. Unrelated sequences can align by chance and the probability of such spurious alignments increases with decreasing sequence length [15]

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call