Abstract
For the vast majority of species – including many economically or ecologically important organisms, progress in biological research is hampered due to the lack of a reference genome sequence. Despite recent advances in sequencing technologies, several factors still limit the availability of such a critical resource. At the same time, many research groups and international consortia have already produced BAC libraries and physical maps and now are in a position to proceed with the development of whole-genome sequences organized around a physical map anchored to a genetic map. We propose a BAC-by-BAC sequencing protocol that combines combinatorial pooling design and second-generation sequencing technology to efficiently approach denovo selective genome sequencing. We show that combinatorial pooling is a cost-effective and practical alternative to exhaustive DNA barcoding when preparing sequencing libraries for hundreds or thousands of DNA samples, such as in this case gene-bearing minimum-tiling-path BAC clones. The novelty of the protocol hinges on the computational ability to efficiently compare hundred millions of short reads and assign them to the correct BAC clones (deconvolution) so that the assembly can be carried out clone-by-clone. Experimental results on simulated data for the rice genome show that the deconvolution is very accurate, and the resulting BAC assemblies have high quality. Results on real data for a gene-rich subset of the barley genome confirm that the deconvolution is accurate and the BAC assemblies have good quality. While our method cannot provide the level of completeness that one would achieve with a comprehensive whole-genome sequencing project, we show that it is quite successful in reconstructing the gene sequences within BACs. In the case of plants such as barley, this level of sequence knowledge is sufficient to support critical end-point objectives such as map-based cloning and marker-assisted breeding.
Highlights
The second generation of DNA sequencing instruments is revolutionizing the way molecular biologists design and carry out investigations in genomics and genetics
Since the number of reads produced by the instrument is essentially fixed, when DNA samples to be sequenced are relatively ‘‘short’’ (e.g., bacterial artificial chromosomes (BACs) clones) and the correspondence between reads and their source has to be maintained, several samples must be ‘‘multiplexed’’ within a single lane to optimize the trade-off between cost and sequencing depth
Pooling minimum-tiling-path BACs While our method can in general be applied to any set of clones that cover a genome or a portion thereof, the protocol we describe here for selective genome sequencing uses a physical map of bacterial artificial chromosomes (BACs) to identify a set of minimally redundant clones
Summary
The second generation of DNA sequencing instruments is revolutionizing the way molecular biologists design and carry out investigations in genomics and genetics. These new sequencing technologies (e.g., Illumina, ABI SOLiD) can produce a significantly greater number of reads at a fraction of the cost of Sangerbased technologies, but with the exception of Roche/454 and Ion Torrent (ABI) read lengths are only 50–150 bases. Since the number of reads produced by the instrument is essentially fixed, when DNA samples to be sequenced are relatively ‘‘short’’ (e.g., BAC clones) and the correspondence between reads and their source has to be maintained, several samples must be ‘‘multiplexed’’ within a single lane to optimize the trade-off between cost and sequencing depth. The resulting distribution of reads for each barcoded sample can be severely skewed (see, e.g., [2,3]), necessitating rounds of selective follow-up
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.