Abstract

Long intergenic noncoding RNAs (lincRNAs) play a crucial role in many biological processes. The rat is an important model organism in biomedical research. Recent studies have detected rat lincRNA genes from several samples. However, identification of rat lincRNAs using large-scale RNA-seq datasets remains unreported. Herein, using more than 100 billion RNA-seq reads from 59 publications together with RefSeq and UniGene annotated RNAs, we report 39,154 lincRNA transcripts encoded by 19,162 lincRNA genes in the rat. We reveal sequence and expression similarities in lincRNAs of rat, mouse and human. DNA methylation level of lincRNAs is higher than that of protein-coding genes across the transcription start sites (TSSs). And, three lincRNA genes overlap with differential methylation regions (DMRs) which associate with spontaneously hypertensive disease. In addition, there are similar binding trends for three transcription factors (HNF4A, CEBPA and FOXA1) between lincRNA genes and protein-coding genes, indicating that they harbour similar transcription regulatory mechanisms. To date, this is the most comprehensive assessment of lincRNAs in the rat genome. We provide valuable data that will advance lincRNA research using rat as a model.

Highlights

  • Long noncoding RNAs are a set of transcripts that are longer than 200 nt and do not encode proteins

  • For a comprehensive retrieval of lincRNA genes in the rat genome, we developed a bioinformatics pipeline that integrates RNA-seq datasets with predetermined annotations from Ensembl, RefSeq and UniGene

  • The pipeline involves five main steps: (1) aligning to reconstructed transcripts, (2) filtering low-quality transcripts, (3) keeping long intergenic multi-exonic transcripts, (4) evaluating coding potential of transcripts and (5) eliminating house-keeping RNAs. This approach is similar to the one we have applied in our previous studies on domestic animal lncRNAs30,31

Read more

Summary

Introduction

Long noncoding RNAs (lncRNAs) are a set of transcripts that are longer than 200 nt and do not encode proteins. Amy Leung et al annotated 466 lncRNA transcripts from two rat vascular smooth muscle cells[14], Feng Wang et al uncovered 2,761 lncRNA transcripts corresponding to 1,620 gene loci from six rat tissues[15] and Kathirvel Gopalakrishnan et al identified 3,272 lncRNA transcripts from three rat strains[16] While these studies report the locations of the lncRNAs, their gene structures are unclear. There are differential DNA methylation patterns between lincRNA genes and protein-coding genes in the rat genome. Both lincRNAs and protein-coding genes have similar TF-binding patterns around TSSs. To facilitate future research on lincRNAs, we avail an open-access database named RatTransc. The rat lncRNA landscape can serve as a useful resource for medical research using the rat as a model, and provide valuable biomarkers for disease diagnosis

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call