Abstract

The differences in transcription start sites (TSS) and transcription end sites (TES) among gene isoforms can affect the stability, localization, and translation efficiency of mRNA. Gene isoforms allow a single gene diverse functions across different cell types, and isoform dynamics allow different functions over time. However, methods to efficiently identify and quantify RNA isoforms genome-wide in single cells are still lacking. Here, we introduce single cell RNA Cap And Tail sequencing (scRCAT-seq), a method to demarcate the boundaries of isoforms based on short-read sequencing, with higher efficiency and lower cost than existing long-read sequencing methods. In conjunction with machine learning algorithms, scRCAT-seq demarcates RNA transcripts with unprecedented accuracy. We identified hundreds of previously uncharacterized transcripts and thousands of alternative transcripts for known genes, revealed cell-type specific isoforms for various cell types across different species, and generated a cell atlas of isoform dynamics during the development of retinal cones.

Highlights

  • The differences in transcription start sites (TSS) and transcription end sites (TES) among gene isoforms can affect the stability, localization, and translation efficiency of mRNA

  • Looking into ERCC data generated by other methods[19,26], such as C1 CAGE, C1 STRT, we found high false-positive rates for peaks identified as TSSs in these datasets (Supplementary Fig. 1c), and applying the machine learning model increased the accuracy to above 88.9% (Supplementary Fig. 1d), indicating that our model can be applied to other datasets that contain high false-positive rates

  • Machine learning has been successfully used to predict differential alternative splicing[32,33], but none of them have been developed for the purpose of identifying authentic demarcations of RNA isoforms to elucidate the transcriptomic complexity of single cells

Read more

Summary

Introduction

The differences in transcription start sites (TSS) and transcription end sites (TES) among gene isoforms can affect the stability, localization, and translation efficiency of mRNA. Methods based on single-cell full-length cDNA amplification such as Smart-seq[2] can detect the full-length cDNA, but its coverage at both ends is low, and it is not possible to accurately distinguish the start and end positions of different transcript isoforms of the same gene[20,21]. In order to address these problems, here we introduce a simple and efficient approach based on well-established short-read sequencing platforms to explicitly exploit transcription initiation and termination sites for RNA isoforms in single cells. When deployed in conjunction with optimized machine learning models, scRCAT-seq is more accurate, cost-effective, and efficient than existing methods in profiling isoforms with alternative TSS/ TES choices

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.