Predicting Coding Potential of RNA Sequences by Solving Local Data Imbalance

Xian-Gan Chen,Wen Zhang,Shuai Liu

doi:10.1109/tcbb.2020.3021800

Abstract

Non-coding RNAs (ncRNAs)play an important role in various biological processes and are associated with diseases. Distinguishing between coding RNAs and ncRNAs, also known as predicting coding potential of RNA sequences, is critical for downstream biological function analysis. Many machine learning-based methods have been proposed for predicting coding potential of RNA sequences. Recent studies reveal that most existing methods have poor performance on RNA sequences with short Open Reading Frames (sORF, ORF length<303nt). In this work, we analyze the distribution of ORF length of RNA sequences, and observe that the number of coding RNAs with sORF is inadequate and coding RNAs with sORF are much less than ncRNAs with sORF. Thus, there exists the problem of local data imbalance in RNA sequences with sORF. We propose a coding potential prediction method CPE-SLDI, which uses data oversampling techniques to augment samples for coding RNAs with sORF so as to alleviate local data imbalance. Compared with existing methods, CPE-SLDI produces the better performances, and studies reveal that data augmentation by various data oversampling techniques can enhance the performance of coding potential prediction, especially for RNA sequences with sORF. The implementation of the proposed method is available at https://github.com/chenxgscuec/CPESLDI.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Predicting Coding Potential of RNA Sequences by Solving Local Data Imbalance

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Computational Biology and Bioinformatics

Lead the way for us

Journal: IEEE/ACM Transactions on Computational Biology and Bioinformatics	Publication Date: Sep 4, 2020
Citations: 15

Similar Papers

Profiling Extracellular Long RNA Transcriptome in Human Plasma and Extracellular Vesicles for Biomarker Discovery.
Rodosthenis S Rodosthenous ... Srimeenakshi Srinivasan
iScience | VOL. 23
Rodosthenis S Rodosthenous, et. al.Rodosthenis S Rodosthenous ... Srimeenakshi Srinivasan
18 May 2020
iScience | VOL. 23

The dark proteome: translation from noncanonical open reading frames.
Bradley W Wright ... Jin Chen
Trends in Cell Biology | VOL. 32
Bradley W Wright, et. al.Bradley W Wright ... Jin Chen
01 Mar 2022
Trends in Cell Biology | VOL. 32

Decision letter: linc-mipep and linc-wrb encode micropeptides that regulate chromatin accessibility in vertebrate-specific neural cells
Filippo Del Bene ... Marianne E Bronner
-
Filippo Del Bene, et. al.Filippo Del Bene ... Marianne E Bronner
16 Sep 2022
16 Sep 2022

Author response: linc-mipep and linc-wrb encode micropeptides that regulate chromatin accessibility in vertebrate-specific neural cells
Valerie A Tornini ... François Kroll
-
Valerie A Tornini, et. al.Valerie A Tornini ... François Kroll
20 Mar 2023
20 Mar 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Predicting Coding Potential of RNA Sequences by Solving Local Data Imbalance

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Computational Biology and Bioinformatics