Abstract

Accurate annotations of genes and their transcripts is a foundation of genomics, but no annotation technique presently combines throughput and accuracy. As a result, reference gene collections remain incomplete: many gene models are fragmentary, while thousands more remain uncatalogued—particularly for long noncoding RNAs (lncRNAs). To accelerate lncRNA annotation, the GENCODE consortium has developed RNA Capture Long Seq (CLS), combining targeted RNA capture with third-generation long-read sequencing. We present an experimental re-annotation of the GENCODE intergenic lncRNA population in matched human and mouse tissues, resulting in novel transcript models for 3574 / 561 gene loci, respectively. CLS approximately doubles the annotated complexity of targeted loci, outperforming existing short-read techniques. Full-length transcript models produced by CLS enable us to definitively characterize the genomic features of lncRNAs, including promoter- and gene-structure, and protein-coding potential. Thus CLS removes a longstanding bottleneck of transcriptome annotation, generating manual-quality full-length transcript models at high-throughput scales.

Highlights

  • Long noncoding RNAs represent a vast and largely unexplored component of the mammalian genome

  • We created a comprehensive capture library targeting the set of intergenic GENCODE long noncoding RNAs (lncRNAs) in human and mouse

  • Capture Long Read Sequencing produces transcript models with quality approaching that of human annotators, yet with throughput comparable to insilico transcriptome reconstruction

Read more

Summary

Introduction

Long noncoding RNAs (lncRNAs) represent a vast and largely unexplored component of the mammalian genome. Efforts to assign lncRNA functions rest on the availability of high-quality transcriptome annotations. At present such annotations are still rudimentary: we have little idea of the total lncRNA count, and for those that have been identified, transcript structures remain largely incomplete. Gene sets, deriving from a mixture of FANTOM cDNA sequencing efforts and public databases[1,2] were joined by the “lincRNA” (long intergenic non-coding RNA) sets, discovered through chromatin signatures[3]. The reference for lncRNAs has become the regularly-updated, manual annotations from GENCODE, based on curation of cDNAs/ESTs by human annotators[10,11], and adopted by international genomics consortia[12,13,14,15]

Objectives
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.