Abstract

Several applications of high throughput genome and transcriptome sequencing would benefit from a reduction of the high-copy-number sequences in the libraries being sequenced and analyzed, particularly when applied to species with large genomes. We adapted and analyzed the consequences of a method that utilizes a thermostable duplex-specific nuclease for reducing the high-copy components in transcriptomic and genomic libraries prior to sequencing. This reduces the time, cost, and computational effort of obtaining informative transcriptomic and genomic sequence data for both fully sequenced and non-sequenced genomes. It also reduces contamination from organellar DNA in preparations of nuclear DNA. Hybridization in the presence of 3 M tetramethylammonium chloride (TMAC), which equalizes the rates of hybridization of GC and AT nucleotide pairs, reduced the bias against sequences with high GC content. Consequences of this method on the reduction of high-copy and enrichment of low-copy sequences are reported for Arabidopsis and lettuce.

Highlights

  • The current generation of DNA sequencing technologies provides opportunities to generate massive amounts of sequence data for both transcriptomes and genomes

  • Experimental Overview We have conducted a series of experiments examining the consequences of duplex-specific nuclease (DSN) treatment while generating sequencing libraries for a variety of studies over the past eight years

  • For each set of libraries we used what we considered to be the best protocol at that time in conjunction with the sequencing technology contemporaneously available

Read more

Summary

Introduction

The current generation of DNA sequencing technologies provides opportunities to generate massive amounts of sequence data for both transcriptomes and genomes. Genomes of higher eukaryotes, especially those of animal and plant species, often contain highly repeated sequences of varying degrees of complexity and sequence divergence that are difficult to assemble and interfere with analyses of the low-copy genomic components. Some 35 to 50% of mammalian genomes [1,2,3] and more than 80% of some plant genomes [4,5] are comprised of low complexity and highly repeated sequences. The variable copy numbers of families of repeated elements that have diverged from one another over time [2,8,9] make assembly of shotgun-sequenced genomes problematic and inefficient [2,10]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call