Abstract

BackgroundLong noncoding RNAs (lncRNAs) have dense linkages with a plethora of important cellular activities. lncRNAs exert functions by linking with corresponding RNA-binding proteins. Since experimental techniques to detect lncRNA-protein interactions (LPIs) are laborious and time-consuming, a few computational methods have been reported for LPI prediction. However, computation-based LPI identification methods have the following limitations: (1) Most methods were evaluated on a single dataset, and researchers may thus fail to measure their generalization ability. (2) The majority of methods were validated under cross validation on lncRNA-protein pairs, did not investigate the performance under other cross validations, especially for cross validation on independent lncRNAs and independent proteins. (3) lncRNAs and proteins have abundant biological information, how to select informative features need to further investigate.ResultsUnder a hybrid framework (LPI-HyADBS) integrating feature selection based on AdaBoost, and classification models including deep neural network (DNN), extreme gradient Boost (XGBoost), and SVM with a penalty Coefficient of misclassification (C-SVM), this work focuses on finding new LPIs. First, five datasets are arranged. Each dataset contains lncRNA sequences, protein sequences, and an LPI network. Second, biological features of lncRNAs and proteins are acquired based on Pyfeat. Third, the obtained features of lncRNAs and proteins are selected based on AdaBoost and concatenated to depict each LPI sample. Fourth, DNN, XGBoost, and C-SVM are used to classify lncRNA-protein pairs based on the concatenated features. Finally, a hybrid framework is developed to integrate the classification results from the above three classifiers. LPI-HyADBS is compared to six classical LPI prediction approaches (LPI-SKF, LPI-NRLMF, Capsule-LPI, LPI-CNNCP, LPLNP, and LPBNI) on five datasets under 5-fold cross validations on lncRNAs, proteins, lncRNA-protein pairs, and independent lncRNAs and independent proteins. The results show LPI-HyADBS has the best LPI prediction performance under four different cross validations. In particular, LPI-HyADBS obtains better classification ability than other six approaches under the constructed independent dataset. Case analyses suggest that there is relevance between ZNF667-AS1 and Q15717.ConclusionsIntegrating feature selection approach based on AdaBoost, three classification techniques including DNN, XGBoost, and C-SVM, this work develops a hybrid framework to identify new linkages between lncRNAs and proteins.

Highlights

  • Motivation RNA-protein interactions regulate many cellular processes including splicing, polyadenylation, stability, transportation and translation [1, 2]

  • Integrating feature selection approach based on AdaBoost, three classification techniques including deep neural network (DNN), eXtreme gradient boosting (XGBoost), and C-support vector machine (SVM), this work develops a hybrid framework to identify new linkages between Long noncoding RNA (lncRNA) and proteins

  • 1, if lncRNAs li interacts with protein pj Overview of lncRNA-protein interaction (LPI)‐HyADBS In this manuscript, we propose a hybrid framework for LPI identification (LPIHyADBS)

Read more

Summary

Introduction

Motivation RNA-protein interactions regulate many cellular processes including splicing, polyadenylation, stability, transportation and translation [1, 2]. The class of molecules are densely associated with a plethora of cellular activities and play vital roles in regulating gene expression [7]. Researches found that lncRNAs exert their regulation roles through associations with the homologous RNA-binding proteins, that is, lncRNA-protein interactions (LPIs) [10,11,12]. Identification of LPIs will be beneficial to complex disease research and can advance diagnosis and treatment procedures [11]. Considering the time-consuming and laborious nature of laboratory methods, researchers pay more attention to computational intelligence [13]. Long noncoding RNAs (lncRNAs) have dense linkages with a plethora of important cellular activities. LncRNAs exert functions by linking with corresponding RNA-binding proteins. Since experimental techniques to detect lncRNA-protein interactions (LPIs) are laborious and time-consuming, a few computational methods have been reported for LPI prediction. Computation-based LPI identification methods have the following limitations: (1) Most methods were evaluated on a single dataset, and researchers may fail to measure their generalization ability. (2) The majority of methods were validated under cross validation on lncRNA-protein pairs, did not investigate the performance under other cross validations, especially for cross validation on independent lncRNAs and independent proteins. (3) lncRNAs and proteins have abundant biological information, how to select informative features need to further investigate

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call