Abstract

Current methods for genome-wide analysis of gene expression require fragmentation of original transcripts into small fragments for short-read sequencing. In bacteria, the resulting fragmented information hides operon complexity. Additionally, in vivo processing of transcripts confounds the accurate identification of the 5′ and 3′ ends of operons. Here we develop a methodology called SMRT-Cappable-seq that combines the isolation of un-fragmented primary transcripts with single-molecule long read sequencing. Applied to E. coli, this technology results in an accurate definition of the transcriptome with 34% of known operons from RegulonDB being extended by at least one gene. Furthermore, 40% of transcription termination sites have read-through that alters the gene content of the operons. As a result, most of the bacterial genes are present in multiple operon variants reminiscent of eukaryotic splicing. By providing such granularity in the operon structure, this study represents an important resource for the study of prokaryotic gene network and regulation.

Highlights

  • Current methods for genome-wide analysis of gene expression require fragmentation of original transcripts into small fragments for short-read sequencing

  • Since the first step of most in vivo RNA degradation pathways is thought to consist of the removal of the 5′ triphosphate, the capturing of triphosphorylated RNA removes degraded and/or processed transcripts on the 3′ end, ends generated from RNase E processing[7]

  • Despite size selection of the SMRTCappable-seq library favoring long fragments, we found a decent correlation (Spearman’s rank correlation 0.798, p value < 2.2e-16) between gene expression derived from SMRT-Cappable-seq and published Illumina RNA-seq[9] (Fig. 1c and Supplementary Note 1) indicating that SMRT-Cappable-seq is suitable for quantitative measurement of transcript levels

Read more

Summary

Introduction

Current methods for genome-wide analysis of gene expression require fragmentation of original transcripts into small fragments for short-read sequencing. RNA-seq and microarrays have been instrumental in understanding many of these mechanisms While these technologies are great in interrogating genome-wide expression profiles in windows of hundreds of bases, they do not provide information on the larger transcriptional context (TC) typically found with bacterial operons. This shortcoming in current technologies for transcriptome analysis has impaired our ability to delineate transcript starts and ends that are typically several kb apart. Sequencing E.coli transcriptome using SMRT-Cappable-seq reveals complex operon structures originated notably from the widespread existence of read-through at termination sites Such read-through can be modulated according to growth conditions, highlighting a possible regulatory mechanism for gene expression

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.