Abstract

Circular RNA (circRNA) is a closed long non-coding RNA (lncRNA) formed by covalently closed loops through back-splicing. Emerging evidence indicates that circRNA can influence cellular physiology through various molecular mechanisms. Thus, accurate circRNA identification and prediction of its regulatory information are critical for understanding its biogenesis. Although several computational tools based on machine learning have been proposed for circRNA identification, the prediction accuracy remains to be improved. Here, first we present circLGB, a machine learning-based framework to discriminate circRNA from other lncRNAs. circLGB integrates commonly used sequence-derived features and three new features containing adenosine to inosine (A-to-I) deamination, A-to-I density and the internal ribosome entry site. circLGB categorizes circRNAs by utilizing a LightGBM classifier with feature selection. Second, we introduce circMRT, an ensemble machine learning framework to systematically predict the regulatory information for circRNA, including their interactions with microRNA, the RNA binding protein, and transcriptional regulation. Feature sets including sequence-based features, graph features, genome context, and regulatory information features were modeled in circMRT. Experiments on public and our constructed datasets show that the proposed algorithms outperform the available state-of-the-art methods. circLGB is available at http://www.circlgb.com. Source codes are available at https://github.com/Peppags/circLGB-circMRT.

Highlights

  • Circular RNA constitutes a unique class of RNAs that is characterized by the presence of a covalently closed cyclic structure without a poly adenylated tail (Lasda and Parker, 2014)

  • We introduce two machine learning-based methods, circLGB and circMRT to combine both sequence and structure information, to identify circRNAs from other long non-coding RNA (lncRNA) and to predict their regulatory interactions, respectively. circLGB extracts the commonly used features and three new features including adenosine to inosine (A-to-I) deamination, A-to-I density as well as internal ribosome entry site (IRES), and in turn, distinguishes circRNA by utilizing a LightGBM classifier

  • We propose a two-step feature optimization strategy to select the most discriminative features. circLGB achieves superior performance on the public and our datasets compared to the state-of-the-art methods. circMRT integrates sequencebased features, graph features, genome context and regulatory information for predicting circRNA interacting with miRNA, RNA binding proteins (RBPs), and transcriptional regulation (TR)

Read more

Summary

Introduction

Circular RNA (circRNA) constitutes a unique class of RNAs that is characterized by the presence of a covalently closed cyclic structure without a poly adenylated tail (Lasda and Parker, 2014). During pre-mRNA splicing, the 5 and 3 termini of exons can be covalently ligated to form circRNAs (Barrett et al, 2015; Wang and Wang, 2015). Owing to their circular structure and lack of free ends (Awasthi et al, 2018), circRNAs have greater stability and are more conserved across species than linear RNAs (Jeck et al, 2013). CircRNAs have a promising potential as biomarkers of diseases due to their stability and relation to diseases (Zhang et al, 2018)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call