Abstract

Long non-coding RNA (lncRNA) is a large class of gene transcripts with regulatory functions discovered in recent years. Many more are expected to be revealed with accumulation of RNA-seq data from diverse types of normal and diseased tissues. However, discovering novel lncRNAs and accurately quantifying known lncRNAs is not trivial from massive RNA-seq data. Herein we describe UClncR, an Ultrafast and Comprehensive lncRNA detection pipeline to tackle the challenge. UClncR takes standard RNA-seq alignment file, performs transcript assembly, predicts lncRNA candidates, quantifies and annotates both known and novel lncRNA candidates, and generates a convenient report for downstream analysis. The pipeline accommodates both un-stranded and stranded RNA-seq so that lncRNAs overlapping with other genes can be predicted and quantified. UClncR is fully parallelized in a cluster environment yet allows users to run samples sequentially without a cluster. The pipeline can process a typical RNA-seq sample in a matter of minutes and complete hundreds of samples in a matter of hours. Analysis of predicted lncRNAs from two test datasets demonstrated UClncR’s accuracy and their relevance to sample clinical phenotypes. UClncR would facilitate researchers’ novel lncRNA discovery significantly and is publically available at http://bioinformaticstools.mayo.edu/research/UClncR.

Highlights

  • Long non-coding RNA is a large class of gene transcripts with regulatory functions discovered in recent years

  • As most of the newly discovered Long non-coding RNA (lncRNA) are from a limited number of tissue or cell types, it is expected that many new lncRNAs are yet to be characterized, in diverse and heterogeneous human diseased tissues such as different types of human cancer

  • The result is summarized in an html index page that starts with project description, configuration settings, and analytical workflow

Read more

Summary

Introduction

Long non-coding RNA (lncRNA) is a large class of gene transcripts with regulatory functions discovered in recent years. UClncR takes standard RNA-seq alignment file, performs transcript assembly, predicts lncRNA candidates, quantifies and annotates both known and novel lncRNA candidates, and generates a convenient report for downstream analysis. The pipeline accommodates both un-stranded and stranded RNA-seq so that lncRNAs overlapping with other genes can be predicted and quantified. Sebnif (self-estimation based novel lincRNA filter pipeline) provides a certain degree of solution[7] It takes pre-assembled transcripts (such as from Cufflinks) and predicts “novel” long intergenic non-coding RNAs (lincRNAs) for a sample. An integrated pipeline that combines all steps and works efficiently is needed

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call