Abstract

BackgroundDuring the last decade, the analysis of ancient DNA (aDNA) sequence has become a powerful tool for the study of past human populations. However, the degraded nature of aDNA means that aDNA molecules are short and frequently mutated by post-mortem chemical modifications. These features decrease read mapping accuracy and increase reference bias, in which reads containing non-reference alleles are less likely to be mapped than those containing reference alleles. Alternative approaches have been developed to replace the linear reference with a variation graph which includes known alternative variants at each genetic locus. Here, we evaluate the use of variation graph software vg to avoid reference bias for aDNA and compare with existing methods.ResultsWe use vg to align simulated and real aDNA samples to a variation graph containing 1000 Genome Project variants and compare with the same data aligned with bwa to the human linear reference genome. Using vg leads to a balanced allelic representation at polymorphic sites, effectively removing reference bias, and more sensitive variant detection in comparison with bwa, especially for insertions and deletions (indels). Alternative approaches that use relaxed bwa parameter settings or filter bwa alignments can also reduce bias but can have lower sensitivity than vg, particularly for indels.ConclusionsOur findings demonstrate that aligning aDNA sequences to variation graphs effectively mitigates the impact of reference bias when analyzing aDNA, while retaining mapping sensitivity and allowing detection of variation, in particular indel variation, that was previously missed.

Highlights

  • During the last decade, the analysis of ancient DNA sequence has become a powerful tool for the study of past human populations

  • Evaluating reference bias in ancient DNA (aDNA) using simulation First, we used simulation to examine the impact of post-mortem deamination (PMD) in vg and bwa read alignment, including assessments after applying sequencing read [10] and reference genome modification [15]

  • 50-bp reads spanning variant sites on chromosome 11 of the Human Origins single nucleotide polymorphisms (SNPs) panel [20, 21], which contains a set of SNPs designed to be highly informative about the genetic diversity in human populations

Read more

Summary

Introduction

The analysis of ancient DNA (aDNA) sequence has become a powerful tool for the study of past human populations. Since the initial application of high-throughput sequencing to ancient human remains [2], the number of aDNA samples with available sequence data has been increasing at a fast pace, and currently, over 2000 ancient samples have been published [3] These studies have provided insights into past population history and allow direct tests of hypotheses raised in archeology, anthropology, and linguistics [4, 5]. A number of unique and irreplaceable samples were sequenced prior to the adoption of UDG treatment Taking all these factors into account, ancient DNA data is generally of low coverage, short length, and high intrinsic error rate

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call