Abstract
Isolating and sequencing specific regions in a genome is a cornerstone of molecular biology. This has been facilitated by computationally encoding the thermodynamics of DNA hybridization for automated design of hybridization and priming oligonucleotides. However, the repetitive composition of genomes challenges the identification of target-specific oligonucleotides, which limits genetics and genomics research on many species. Here, a tool called ThermoAlign was developed that ensures the design of target-specific primer pairs for DNA amplification. This is achieved by evaluating the thermodynamics of hybridization for full-length oligonucleotide-template alignments — thermoalignments — across the genome to identify primers predicted to bind specifically to the target site. For amplification-based resequencing of regions that cannot be amplified by a single primer pair, a directed graph analysis method is used to identify minimum amplicon tiling paths. Laboratory validation by standard and long-range polymerase chain reaction and amplicon resequencing with maize, one of the most repetitive genomes sequenced to date (≈85% repeat content), demonstrated the specificity-by-design functionality of ThermoAlign. ThermoAlign is released under an open source license and bundled in a dependency-free container for wide distribution. It is anticipated that this tool will facilitate multiple applications in genetics and genomics and be useful in the workflow of high-throughput targeted resequencing studies.
Highlights
Isolating and sequencing specific regions in a genome is a cornerstone of molecular biology
Through the current era of high-throughput sequencing, the design of oligonucleotides has remained a fundamental need for genome science research and applications
Specificity of primer pair amplification was addressed in the development of ThermoAlign for the automated design of priming oligonucleotides
Summary
Isolating and sequencing specific regions in a genome is a cornerstone of molecular biology. It has been estimated that >50%6 and as much as 69%7 of the human genome is repetitive, and over 80% of the genomes for some plant species is repetitive (e.g. refs 8 and 9) This poses a significant challenge to designing oligonucleotides that will hybridize and prime only on-target. Primers used for amplification-based approaches may produce off-target products[13,14] This presents the need for “genome-aware” oligonucleotide design tools that leverage reference genome sequence data to maximize the enrichment of on-target sequences. There are several computational tools available that facilitate genome-aware primer design[15,16,17,18], obtaining specific amplification of targeted sequences is still a difficult problem, especially for genomes with large amounts of repetitive DNA. Reaction chemistry, nucleotide composition and the position and type of mismatching nucleotides[20], such that the number of mismatches alone is likely to be an insufficient measure of mispriming potential
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have