Predicting Long non-coding RNAs through feature ensemble learning

Yanzhen Xu,Wen Zhang,Xiaohan Zhao,Shuai Liu

doi:10.1186/s12864-020-07237-y

Yanzhen Xu, Wen Zhang + Show 2 more

Open Access

https://doi.org/10.1186/s12864-020-07237-y

Copy DOI

Journal: BMC Genomics	Publication Date: Dec 1, 2020
Citations: 3	License type: open-access

Affiliation: Huazhong Agricultural University

Abstract

BackgroundMany transcripts have been generated due to the development of sequencing technologies, and lncRNA is an important type of transcript. Predicting lncRNAs from transcripts is a challenging and important task. Traditional experimental lncRNA prediction methods are time-consuming and labor-intensive. Efficient computational methods for lncRNA prediction are in demand.ResultsIn this paper, we propose two lncRNA prediction methods based on feature ensemble learning strategies named LncPred-IEL and LncPred-ANEL. Specifically, we encode sequences into six different types of features including transcript-specified features and general sequence-derived features. Then we consider two feature ensemble strategies to utilize and integrate the information in different feature types, the iterative ensemble learning (IEL) and the attention network ensemble learning (ANEL). IEL employs a supervised iterative way to ensemble base predictors built on six different types of features. ANEL introduces an attention mechanism-based deep learning model to ensemble features by adaptively learning the weight of individual feature types. Experiments demonstrate that both LncPred-IEL and LncPred-ANEL can effectively separate lncRNAs and other transcripts in feature space. Moreover, comparison experiments demonstrate that LncPred-IEL and LncPred-ANEL outperform several state-of-the-art methods when evaluated by 5-fold cross-validation. Both methods have good performances in cross-species lncRNA prediction.ConclusionsLncPred-IEL and LncPred-ANEL are promising lncRNA prediction tools that can effectively utilize and integrate the information in different types of features.

Highlights

Many transcripts have been generated due to the development of sequencing technologies, and Long non-coding RNA (lncRNA) is an important type of transcript
Traditional experimental methods for lncRNA identification are time-consuming and labor-intensive, cannot perform lncRNA prediction when dealing with a massive number of transcripts
For model construction, ensemble learning models and deep learning models have been used in lncRNA prediction methods, existing models lack consideration for the intricate interactions between different types of features

Summary

Results

We propose two lncRNA prediction methods based on feature ensemble learning strategies named LncPred-IEL and LncPred-ANEL. We consider two feature ensemble strategies to utilize and integrate the information in different feature types, the iterative ensemble learning (IEL) and the attention network ensemble learning (ANEL). ANEL introduces an attention mechanism-based deep learning model to ensemble features by adaptively learning the weight of individual feature types. Experiments demonstrate that both LncPred-IEL and LncPred-ANEL can effectively separate lncRNAs and other transcripts in feature space. Comparison experiments demonstrate that LncPred-IEL and LncPred-ANEL outperform several state-of-the-art methods when evaluated by 5-fold cross-validation. Both methods have good performances in cross-species lncRNA prediction

Background

Results and discussion

Conclusion

Methods