Abstract
BackgroundNext-Generation Sequencing has revolutionized our approach to ancient DNA (aDNA) research, by providing complete genomic sequences of ancient individuals and extinct species. However, the recovery of genetic material from long-dead organisms is still complicated by a number of issues, including post-mortem DNA damage and high levels of environmental contamination. Together with error profiles specific to the type of sequencing platforms used, these specificities could limit our ability to map sequencing reads against modern reference genomes and therefore limit our ability to identify endogenous ancient reads, reducing the efficiency of shotgun sequencing aDNA.ResultsIn this study, we compare different computational methods for improving the accuracy and sensitivity of aDNA sequence identification, based on shotgun sequencing reads recovered from Pleistocene horse extracts using Illumina GAIIx and Helicos Heliscope platforms. We show that the performance of the Burrows Wheeler Aligner (BWA), that has been developed for mapping of undamaged sequencing reads using platforms with low rates of indel-types of sequencing errors, can be employed at acceptable run-times by modifying default parameters in a platform-specific manner. We also examine if trimming likely damaged positions at read ends can increase the recovery of genuine aDNA fragments and if accurate identification of human contamination can be achieved using a strategy previously suggested based on best hit filtering. We show that combining our different mapping and filtering approaches can increase the number of high-quality endogenous hits recovered by up to 33%.ConclusionsWe have shown that Illumina and Helicos sequences recovered from aDNA extracts could not be aligned to modern reference genomes with the same efficiency unless mapping parameters are optimized for the specific types of errors generated by these platforms and by post-mortem DNA damage. Our findings have important implications for future aDNA research, as we define mapping guidelines that improve our ability to identify genuine aDNA sequences, which in turn could improve the genotyping accuracy of ancient specimens. Our framework provides a significant improvement to the standard procedures used for characterizing ancient genomes, which is challenged by contamination and often low amounts of DNA material.
Highlights
Next-Generation Sequencing has revolutionized our approach to ancient DNA research, by providing complete genomic sequences of ancient individuals and extinct species
Illumina and Helicos sequencing reads were mapped against both the horse (Equus caballus) and the human reference genomes and high-quality endogenous hits were defined as hits found to map to a unique location in the horse genome with mapping qualities equal or higher than 25 while showing no high-quality hit to the human reference genome
As endogenous ancient DNA (aDNA) reads have been shown to exhibit specific miscoding lesions that could be used as molecular signatures of post-mortem DNA damage, nucleotide misincorporation patterns [20] were used to assess the quality of the extra hits recovered with the different mapping strategies explored
Summary
Next-Generation Sequencing has revolutionized our approach to ancient DNA (aDNA) research, by providing complete genomic sequences of ancient individuals and extinct species. Together with error profiles specific to the type of sequencing platforms used, these specificities could limit our ability to map sequencing reads against modern reference genomes and limit our ability to identify endogenous ancient reads, reducing the efficiency of shotgun sequencing aDNA. The first successful application of library-free third-generation sequencing of aDNA templates has been reported from a Pleistocene horse bone using the Helicos true Single Molecule DNA Sequencing platform (tSMS) [15,16]. Adding to the DNA damage induced nucleotide misincorporations that are typical of ancient templates, these specificities could limit our ability to map sequencing reads against modern reference genomes and limit our ability to identify genuine endogenous tSMS reads, reducing the efficiency of shotgun sequencing
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.