Abstract
BackgroundLong non-coding RNAs (lncRNAs) are increasingly implicated as gene regulators and may ultimately be more numerous than protein-coding genes in the human genome. Despite large numbers of reported lncRNAs, reference annotations are likely incomplete due to their lower and tighter tissue-specific expression compared to mRNAs. An unexplored factor potentially confounding lncRNA identification is inter-individual expression variability. Here, we characterize lncRNA natural expression variability in human primary granulocytes.ResultsWe annotate granulocyte lncRNAs and mRNAs in RNA-seq data from 10 healthy individuals, identifying multiple lncRNAs absent from reference annotations, and use this to investigate three known features (higher tissue-specificity, lower expression, and reduced splicing efficiency) of lncRNAs relative to mRNAs. Expression variability was examined in seven individuals sampled three times at 1- or more than 1-month intervals. We show that lncRNAs display significantly more inter-individual expression variability compared to mRNAs. We confirm this finding in two independent human datasets by analyzing multiple tissues from the GTEx project and lymphoblastoid cell lines from the GEUVADIS project. Using the latter dataset we also show that including more human donors into the transcriptome annotation pipeline allows identification of an increasing number of lncRNAs, but minimally affects mRNA gene number.ConclusionsA comprehensive annotation of lncRNAs is known to require an approach that is sensitive to low and tight tissue-specific expression. Here we show that increased inter-individual expression variability is an additional general lncRNA feature to consider when creating a comprehensive annotation of human lncRNAs or proposing their use as prognostic or disease markers.Electronic supplementary materialThe online version of this article (doi:10.1186/s13059-016-0873-8) contains supplementary material, which is available to authorized users.
Highlights
Long non-coding RNAs are increasingly implicated as gene regulators and may be more numerous than protein-coding genes in the human genome
We validated these criteria by applying the pipeline to the above public annotations; this identified the majority of annotated Long non-coding RNAs (lncRNAs) as non-protein-coding, whereas the majority of mRNAs were identified as protein-coding (Additional file 1:Figure S1E)
We demonstrate here by analysis of human granulocyte RNA-seq data from multiple individuals that lncRNAs show unusually high natural expression variability compared to mRNAs
Summary
Long non-coding RNAs (lncRNAs) are increasingly implicated as gene regulators and may be more numerous than protein-coding genes in the human genome. Long non-protein coding RNAs (lncRNAs) have emerged as a fundamental new layer of genomic information in diverse species [1]. They are considered to participate primarily in mRNA gene regulation [2,3,4,5] and to play roles in development and disease [6,7,8]. An incomplete annotation may arise from two known features of lncRNAs - low abundance and tight tissue-specificity [14, 25]. A recent attempt to define the human lncRNA landscape used several thousand normal and malignant samples and identified almost 47,000 new lncRNA genes [29], supporting earlier predictions that lncRNAs may outnumber protein-coding genes in human [30]
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have