Abstract

Apart from protein-coding Ribonucleic acids (RNAs), there exists a variety of non-coding RNAs (ncRNAs) which regulate complex cellular and molecular processes. High-throughput sequencing technologies and bioinformatics approaches have largely promoted the exploration of ncRNAs which revealed their crucial roles in gene regulation, miRNA binding, protein interactions, and splicing. Furthermore, ncRNAs are involved in the development of complicated diseases like cancer. Categorization of ncRNAs is essential to understand the mechanisms of diseases and to develop effective treatments. Sub-cellular localization information of ncRNAs demystifies diverse functionalities of ncRNAs. To date, several computational methodologies have been proposed to precisely identify the class as well as sub-cellular localization patterns of RNAs). This paper discusses different types of ncRNAs, reviews computational approaches proposed in the last 10 years to distinguish coding-RNA from ncRNA, to identify sub-types of ncRNAs such as piwi-associated RNA, micro RNA, long ncRNA, and circular RNA, and to determine sub-cellular localization of distinct ncRNAs and RNAs. Furthermore, it summarizes diverse ncRNA classification and sub-cellular localization determination datasets along with benchmark performance to aid the development and evaluation of novel computational methodologies. It identifies research gaps, heterogeneity, and challenges in the development of computational approaches for RNA sequence analysis. We consider that our expert analysis will assist Artificial Intelligence researchers with knowing state-of-the-art performance, model selection for various tasks on one platform, dominantly used sequence descriptors, neural architectures, and interpreting inter-species and intra-species performance deviation.

Highlights

  • Using benchmark datasets of human and mouse species taken from the GENCODE database [68], LncRScan-support vector machine (SVM) combined the features extracted from transcript sequence, gene structure, codon sequence as well as conservation to achieve the performance of 92% using an SVM classifier for the task of distinguishing long ncRNA (lncRNA) from messenger ribonucleic acids (mRNAs)

  • The importance of lncRNA, mRNA, and miRNA and computational methodologies proposed to determine their biological functionalities through sub-cellular localization are briefly discussed below

  • Considering that sequence descriptors introduce significant bias and irrelevant features as well as generating encoding, the use of feature selection approaches soon became a frontier in the development of robust lncRNA sub-cellular localization prediction approaches

Read more

Summary

Introduction

The focus of this study is to shed light on distinct kinds of ncRNAs, discuss their biological importance, review machine and deep learning approaches proposed over the time to identify the sub-type of ncRNAs, and to predict their sub-cellular localization It facilitates an interactive summary of benchmark datasets developed to evaluate the integrity of computational approaches for various tasks. A bird’s eye view on biological significance of diverse ncRNA species, their involvement in a wide range of cellular processes, disease development, and potential to act as biomarkers; Taking heterogeneity of ncRNAs in terms of sequence length, structure, physical, and chemical characteristics into account, discussing the importance of distinguishing ncRNAs from protein-coding transcripts as well as identifying its sub-type; Shedding lights on the significance of ncRNA sub-cellular localization information in regard to understand the core functionality of ncRNAs and their involvement in different biological processes; Reviewing the progress of Artificial Intelligence for distinct ncRNA sequence analysis tasks including distinguishing ncRNAs from protein-coding transcripts, identifying the sub-type of ncRNAs, and sub-cellular localization; Performing a critical analysis of diverse computational approaches proposed for different ncRNA sequence analysis tasks at different levels such as feature representation, feature selection, classification, and cross-species evaluation; An interactive yet in-depth descriptive analysis of benchmark datasets developed using public database for diverse ncRNA sequence analysis tasks

RNA Classification
Distinguishing Long Non-Coding RNA from Protein Coding RNA
Method
Source Code Availability
Code Availability
Identification of Long Intergenic RNAs
Distinguishing Circular RNAs from Long Non-Coding RNAs
Identification of Small Non-Coding RNAs
Segregating Small and Long Non-Coding RNAs
Family Classification of Small Non-Coding RNAs
Computational Methodologies for Clustering of Non-Coding RNA
Messenger RNA Sub-Cellular Localization
MicroRNAs Sub-Cellular Localization
Long Non-Coding RNA Sub-Cellular Localization
Multi-Label Sub-Cellular Localization Prediction of Diverse RNAs
Benchmark Sub Cellular Localization Datasets
Findings
Current Challenges and Future Directions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call