Identifying LncRNA-Encoded Short Peptides Using Optimized Hybrid Features and Ensemble Learning

Siyuan Zhao,Yushi Luan,Qiang Kang,Jun Meng

doi:10.1109/tcbb.2021.3104288

Abstract

Long non-coding RNA (lncRNA) contains short open reading frames (sORFs), and sORFs-encoded short peptides (SEPs) have become the focus of scientific studies due to their crucial role in life activities. The identification of SEPs is vital to further understanding their regulatory function. Bioinformatics methods can quickly identify SEPs to provide credible candidate sequences for verifying SEPs by biological experimenrts. However, there is a lack of methods for identifying SEPs directly. In this study, a machine learning method to identify SEPs of plant lncRNA (ISPL) is proposed. Hybrid features including sequence features and physicochemical features are extracted manually or adaptively to construct different modal features. In order to keep the stability of feature selection, the non-linear correction applied in Max-Relevance-Max-Distance (nocRD) feature selection method is proposed, which integrates multiple feature ranking results and uses the iterative random forest for different modal features dimensionality reduction. Classification models with different modal features are constructed, and their outputs are combined for ensemble classification. The experimental results show that the accuracy of ISPL is 89.86% percent on the independent test set, which will have important implications for further studies of functional genomic.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Identifying LncRNA-Encoded Short Peptides Using Optimized Hybrid Features and Ensemble Learning

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Computational Biology and Bioinformatics

Lead the way for us

Journal: IEEE/ACM Transactions on Computational Biology and Bioinformatics	Publication Date: Sep 1, 2022
Citations: 5

Similar Papers

Feature Selection and Feature Stability Measurement Method for High-Dimensional Small Sample Data Based on Big Data Technology.
Chengyuan Huang
Computational Intelligence and Neuroscience | VOL. 2021
Chengyuan HuangChengyuan Huang
01 Jan 2020
Computational Intelligence and Neuroscience | VOL. 2021

LncRNA-Encoded Short Peptides Identification Using Feature Subset Recombination and Ensemble Learning.
Siyuan Zhao ... Yushi Luan
Interdisciplinary sciences, computational life sciences | VOL. 14
Siyuan Zhao, et. al.Siyuan Zhao ... Yushi Luan
25 Jul 2021
Interdisciplinary sciences, computational life sciences | VOL. 14

A review of the stability of feature selection techniques for bioinformatics data
Wael Awada ... Amri Napolitano
-
Wael Awada, et. al.Wael Awada ... Amri Napolitano
01 Aug 2012
01 Aug 2012

Hybrid Feature Selection and Ensemble Learning Methods for Gene Selection and Cancer Classification
Sultan Noman Qasem ... Faisal Saeed
International Journal of Advanced Computer Science and Applications | VOL. 12
Sultan Noman Qasem, et. al.Sultan Noman Qasem ... Faisal Saeed
01 Jan 2020
International Journal of Advanced Computer Science and Applications | VOL. 12

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Identifying LncRNA-Encoded Short Peptides Using Optimized Hybrid Features and Ensemble Learning

Abstract

Talk to us

Similar Papers

More From: IEEE/ACM Transactions on Computational Biology and Bioinformatics