Abstract

BackgroundMassively parallel sequencing platforms, featuring high throughput and relatively short read lengths, are well suited to ancient DNA (aDNA) studies. Variant identification from short-read alignment could be hindered, however, by low DNA concentrations common to historic samples, which constrain sequencing depths, and post-mortem DNA damage patterns.ResultsWe simulated pairs of sequences to act as reference and sample genomes at varied GC contents and divergence levels. Short-read sequence pools were generated from sample sequences, and subjected to varying levels of “post-mortem” damage by adjusting levels of fragmentation and fragmentation biases, transition rates at sequence ends, and sequencing depths. Mapping of sample read pools to reference sequences revealed several trends, including decreased alignment success with increased read length and decreased variant recovery with increased divergence. Variants were generally called with high accuracy, however identification of SNPs (single-nucleotide polymorphisms) was less accurate for high damage/low divergence samples. Modest increases in sequencing depth resulted in rapid gains in total variant recovery, and limited improvements to recovery of heterozygous variants.ConclusionsThis in silico study suggests aDNA-associated damage patterns minimally impact variant call accuracy and recovery from short-read alignment, while modest increases in sequencing depth can greatly improve variant recovery.Electronic supplementary materialThe online version of this article (doi:10.1186/s12864-015-1219-8) contains supplementary material, which is available to authorized users.

Highlights

  • Parallel sequencing platforms, featuring high throughput and relatively short read lengths, are well suited to ancient DNA studies

  • In its first two decades, ancient DNA (aDNA) research was primarily focused on PCR-based amplification and subsequent Sanger sequencing of selected loci and organellar genomes, with results applied to analyses of population differentiation and phylogeography, phylogenetics, and even metagenomics

  • This damage included both fragmentation bias and elevated 5′ C-T/3′ G-A transitions near sequence ends. These pools were subsequently mapped back to their corresponding reference sequences using commonly applied alignment software and at a range of ultra-low to moderate sequencing coverage depths (0.1×, 0.5×, 1×, 2×, 4×, 8× and 16×). This method allowed us to directly compare known damage patterns and variant positions to those measured from mapped read pools, and to estimate whether and to what extent a variety of variables, including coverage depth, read length and damage level, might impact upon variant calling

Read more

Summary

Introduction

Parallel sequencing platforms, featuring high throughput and relatively short read lengths, are well suited to ancient DNA (aDNA) studies. In the last decade the field of aDNA has moved from the genetic to the genomic level with the advent of massively parallel sequencing platforms This has been accompanied by a concurrent shift in focus to full genome sequencing and assembly, and genome-scale analyses of population trends [5,6,7,8,9,10,11,12]. As a consequence of this, most reports applying high throughput sequencing to full genomes of ancient samples have been limited to one or several samples sequenced and at low or ultra-low (i.e, less than 1×) to moderate coverage depths [2] This combination can severely impact the accuracy of variant calls [17]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call