Abstract
Functional (meta) genomics allows the high-throughput identification of functional genes in a premise-free way. However, it is still difficult to perform Sanger sequencing for high GC DNA templates, which hinders the functional genomic exploration of a high GC genomic library. Here, we developed a procedure to resolve this problem by coupling the Sanger and PacBio sequencing strategies. Identification of cadmium (Cd) resistance genes from a small-insert high GC genomic library was performed to test the procedure. The library was generated from a high GC (75.35%) bacterial genome. Nineteen clones that conferred Cd resistance to Escherichia coli subject to Sanger sequencing directly. The positive clones were in parallel subject to in vivo amplification in host cells, from which recombinant plasmids were extracted and linearized by selected restriction endonucleases. PacBio sequencing was performed to obtain the full-length sequences. As the identities, partial sequences from Sanger sequencing were aligned to the full-length sequences from PacBio sequencing, which led to the identification of seven unique full-length sequences. The unique sequences were further aligned to the full genome sequence of the source strain. Functional screening showed that the identified positive clones were all able to improve Cd resistance of the host cells. The functional genomic procedure developed here couples the Sanger and PacBio sequencing methods and overcomes the difficulties in PCR approaches for high GC DNA. The procedure can be a promising option for the high-throughput sequencing of functional genomic libraries, and realize a cost-effective and time-efficient identification of the positive clones, particularly for high GC genetic materials.
Highlights
Base composition substantially impacts genome stability and evolution [1], and high-GC content is thought to be associated with high selective pressure [2]
The positive clones are subject to Sanger sequencing, and aliquots of them are in parallel subject to in vivo amplification in host cells
To verify the derivation of the sequences obtained from the functional genomics procedure, the complete genome of Cellulomonas sp. strain Y8 was sequenced by using the Illumina HiSeq (Illumina, San Diego, CA, USA) and PacBio RS II platforms (Pacific Biosciences, Menlo Park, CA, USA)
Summary
Base composition substantially impacts genome stability and evolution [1], and high-GC content is thought to be associated with high selective pressure [2]. PacBio single molecule real-time (SMRT) sequencing can provide a PCR independent and efficient way to generate long reads with uniform coverage and high consensus accuracy via recognizing the fluorescent signal on single phospholinked nucleotides [17]. This procedure was applied to a small-insert genomic. DNA library of high GC content for the identification of Cd resistant genes This procedure overcomes the difficulties in PCR approaches for high GC gene templates and realizes a cost-effective and time-efficient identification of the positive clones
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have