Abstract

BackgroundLong noncoding RNAs (lncRNAs) have dense linkages with various biological processes. Identifying interacting lncRNA-protein pairs contributes to understand the functions and mechanisms of lncRNAs. Wet experiments are costly and time-consuming. Most computational methods failed to observe the imbalanced characterize of lncRNA-protein interaction (LPI) data. More importantly, they were measured based on a unique dataset, which produced the prediction bias.ResultsIn this study, we develop an Ensemble framework (LPI-EnEDT) with Extra tree and Decision Tree classifiers to implement imbalanced LPI data classification. First, five LPI datasets are arranged. Second, lncRNAs and proteins are separately characterized based on Pyfeat and BioTriangle and concatenated as a vector to represent each lncRNA-protein pair. Finally, an ensemble framework with Extra tree and decision tree classifiers is developed to classify unlabeled lncRNA-protein pairs. The comparative experiments demonstrate that LPI-EnEDT outperforms four classical LPI prediction methods (LPI-BLS, LPI-CatBoost, LPI-SKF, and PLIPCOM) under cross validations on lncRNAs, proteins, and LPIs. The average AUC values on the five datasets are 0.8480, 0,7078, and 0.9066 under the three cross validations, respectively. The average AUPRs are 0.8175, 0.7265, and 0.8882, respectively. Case analyses suggest that there are underlying associations between HOTTIP and Q9Y6M1, NRON and Q15717.ConclusionsFusing diverse biological features of lncRNAs and proteins and exploiting an ensemble learning model with Extra tree and decision tree classifiers, this work focus on imbalanced LPI data classification as well as interaction information inference for a new lncRNA (or protein).

Highlights

  • Motivation Noncoding RNAs are molecules regulating various fundamental cellular processes in complex organisms on a genome-wide level [1]

  • To address the two problem, in this paper, we develop an Ensemble framework (LPI-EnEDT) with Extra tree and Decision Extremely Randomized Trees (Tree) classifiers to infer new Long noncoding RNAs (lncRNAs)-protein interaction (LPI)

  • Datasets 4 and 5 provide LPI data from Arabidopsis thaliana and Zea mays, respectively. lncRNA and protein sequence information is achieved from the plant lncRNA database (PlncRNADB [45])

Read more

Summary

Introduction

Motivation Noncoding RNAs are molecules regulating various fundamental cellular processes in complex organisms on a genome-wide level [1]. Long noncoding RNAs (lncRNAs) are a class of noncoding RNAs with more than 200 nucleotides. Only few lncRNAs have been revealed their biological functions. Identifying the biological functions of lncRNAs helps to boost our knowledge about this class of molecules [16]. Long noncoding RNAs (lncRNAs) have dense linkages with various biological processes. Most computational methods failed to observe the imbalanced characterize of lncRNA-protein interaction (LPI) data. They were measured based on a unique dataset, which produced the prediction bias

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.