Abstract

Present-day contamination can lead to false conclusions in ancient DNA studies. A number of methods are available to estimate contamination, which use a variety of signals and are appropriate for different types of data. Here an overview of currently available methods highlighting their strengths and weaknesses is provided, and a classification based on the signals used to estimate contamination is proposed. This overview aims at enabling researchers to choose the most appropriate methods for their dataset. Based on this classification, potential avenues for the further development of methods are discussed.

Highlights

  • Introduction ments and contamination by humanDNA can lead to false signals of admixture or to underestimation of the divergence to Ancient DNA from historical or archaeological materials, such as present-day humans.[2,29,30]bones, teeth, or hair, represents a valuable resource for studyingSeveral precautions can guard against contamination.[17,31,32]the past

  • Rasmussen et al.[73,74] and Moreno-Mayar et al.[75] used known variants on the X-chromosome to detect sequences that disagree with the majority call. These alternative alleles are unexpected in males that carry only one copy of the X-chromosome and can be used, together with an estimate of sequencing error from neighboring sites, to estimate contamination. Note that this X-chromosome contamination estimate gives an upper limit on the rate of contamination in the autosomes, since contamination originating from females has twice the impact on the X-chromosome of males compared to their autosomes

  • We note that contamination in the context of metagenomics datasets can stem from misassignments of sequences from closely related endogenous species, an issue that goes beyond the scope of this review

Read more

Summary

Classification of Signals Used to Estimate Contamination

Three main signals are informative about the presence of contamination in ancient DNA datasets: sequence differences. By using the presence of damage-associated substitutions to enrich for sequences that stem from ancient molecules, Meyer et al.[96] reconstructed the mitochondrial genome from the highly contaminated sequences of a Neanderthal ancestor found at Sima de los Huesos in Spain and dated to over 400 000 years ago. Due to the high coverage, the correct endogenous mitochondrial genome will be reconstructed as long as contaminating sequences constitute at every site the minority among sequences with C-to-T substitutions Such an approach is not possible for nuclear genomes, where the generated sequence coverage is typically far below onefold for highly degraded samples and informative sites are not always available.

Differences in the DNA Sequence
Deviation from the Expected Ploidy
Ancient DNA Degradation Patterns
Methods to Estimate Contamination
Mitochondrial DNA
Methods Based on Differences in the DNA Sequence
Methods Based on Deviations from the Expected Ploidy
Sex Chromosomes
Autosomal DNA
Methods Based on Patterns of Ancient DNA Damage
Perspectives
Methods Based on Characteristics of Ancient DNA
Conclusion
Findings
Conflict of Interest

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.