Abstract

Amplification artifacts introduced during library preparation for the Illumina Genome Analyzer increase the likelihood that an appreciable proportion of these sequences will be duplicates, and cause an uneven distribution of read coverage across the targeted sequencing regions. As a consequence, these unfavorable features result in difficulties in genome assembly and variation analysis from the short reads, particularly when the sequences are from genomes with base compositions at the extremes of high or low GC content. Here we present an amplification-free method of library preparation, in which the cluster amplification step, rather than the polymerase chain reaction, enriches for fully ligated template strands, reducing the incidence of duplicate sequences, improving read mapping and SNP calling and aiding de novo assembly. We illustrate this by generating and analysing DNA sequences from extremely GC-poor (Plasmodium falciparum), GC-neutral (Escherichia coli) and high GC (Bordetella pertussis) genomes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call