Flnc: Machine Learning Improves the Identification of Novel Long Noncoding RNAs from Stand-Alone RNA-Seq Data.

Zixiu Li,Euijin Kwon,Katherine A Fitzgerald,Peng Zhou,Zhiping Weng,Chan Zhou

doi:10.3390/ncrna8050070

Zixiu Li, Euijin Kwon + Show 4 more

Open Access

https://doi.org/10.3390/ncrna8050070

Copy DOI

Abstract

Long noncoding RNAs (lncRNAs) play critical regulatory roles in human development and disease. Although there are over 100,000 samples with available RNA sequencing (RNA-seq) data, many lncRNAs have yet to be annotated. The conventional approach to identifying novel lncRNAs from RNA-seq data is to find transcripts without coding potential but this approach has a false discovery rate of 30–75%. Other existing methods either identify only multi-exon lncRNAs, missing single-exon lncRNAs, or require transcriptional initiation profiling data (such as H3K4me3 ChIP-seq data), which is unavailable for many samples with RNA-seq data. Because of these limitations, current methods cannot accurately identify novel lncRNAs from existing RNA-seq data. To address this problem, we have developed software, Flnc, to accurately identify both novel and annotated full-length lncRNAs, including single-exon lncRNAs, directly from RNA-seq data without requiring transcriptional initiation profiles. Flnc integrates machine learning models built by incorporating four types of features: transcript length, promoter signature, multiple exons, and genomic location. Flnc achieves state-of-the-art prediction power with an AUROC score over 0.92. Flnc significantly improves the prediction accuracy from less than 50% using the conventional approach to over 85%. Flnc is available via GitHub platform.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Non-Coding RNA	Publication Date: Oct 13, 2022
Citations: 1	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Flnc: Machine Learning Improves the Identification of Novel Long Noncoding RNAs from Stand-Alone RNA-Seq Data.

Abstract

Talk to us

Similar Papers

More From: Non-Coding RNA

Lead the way for us

Similar Papers

Profiling Extracellular Long RNA Transcriptome in Human Plasma and Extracellular Vesicles for Biomarker Discovery.
Rodosthenis S Rodosthenous ... Rebecca Reiman
iScience | VOL. 23
Rodosthenis S Rodosthenous, et. al.Rodosthenis S Rodosthenous ... Rebecca Reiman
18 May 2020
iScience | VOL. 23

Author response: Heterochromatin-dependent transcription of satellite DNAs in the Drosophila melanogaster female germline
Xiaolu Wei ... Amanda M Larracuente
-
Xiaolu Wei, et. al.Xiaolu Wei ... Amanda M Larracuente
19 May 2021
19 May 2021

Performance evaluation of lossy quality compression algorithms for RNA-seq data
Rongshan Yu ... Wenxian Yang
BMC Bioinformatics | VOL. 21
Rongshan Yu, et. al.Rongshan Yu ... Wenxian Yang
20 Jul 2020
BMC Bioinformatics | VOL. 21

Quality control assessment of the RNA-Seq data generated from liver and pituitary transcriptome of Hereford bulls using StrandNGS software
Chandra Shekhar Pareek ... Mariusz Pierzchała
Translational Research in Veterinary Science | VOL. 2
Chandra Shekhar Pareek, et. al.Chandra Shekhar Pareek ... Mariusz Pierzchała
12 Sep 2019
Translational Research in Veterinary Science | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Flnc: Machine Learning Improves the Identification of Novel Long Noncoding RNAs from Stand-Alone RNA-Seq Data.

Abstract

Talk to us

Similar Papers

More From: Non-Coding RNA