Abstract

RNA elements that are transcribed but not translated into proteins are called non-coding RNAs (ncRNAs). They play wide-ranging roles in biological processes and disorders. Just like proteins, their structure is often intimately linked to their function. Many examples have been documented where structure is conserved across taxa despite sequence divergence. Thus, structure is often used to identify function. Specifically, the secondary structure is predicted and ncRNAs with similar structures are assumed to have same or similar functions. However, a strand of RNA can fold into multiple possible structures, and some strands even fold differently in vivo and in vitro. Furthermore, ncRNAs often function as RNA-protein complexes, which can affect structure. Because of these, we hypothesized using one structure per sequence may discard information, possibly resulting in poorer classification accuracy. Therefore, we propose using secondary structure fingerprints, comprising two categories: a higher-level representation derived from RNA-As-Graphs (RAG), and free energy fingerprints based on a curated repertoire of small structural motifs. The fingerprints take into account the difference between global and local structural matches. We also evaluated our deep learning architecture with k-mers. By combining our global-local fingerprints with 6-mer, we achieved an accuracy, precision, and recall of 91.04%, 91.10%, and 91.00%.

Highlights

  • N ON-CODING RNAS are RNA molecules that do not code for protein but serve several functions in living cells, including the regulation of gene expression at all levels: transcription, splicing, and translation [1]

  • Based on criteria such function and length, as well as primary, secondary, tertiary, and quaternary structure, they can be further classified into different subclasses [11]; including micro RNA, transfer RNA, CDbox, riboswitch, and small nuclear RNA [12]

  • The secondary structure fingerprints are formed by checking whether or not there are specific structural motifs that can be formed by the input sequence, or parts of it, scoring any matches that are found using the values derived from these scores

Read more

Summary

Introduction

N ON-CODING RNAS (ncRNAs) are RNA molecules that do not code for protein but serve several functions in living cells, including the regulation of gene expression at all levels: transcription, splicing, and translation [1]. They have been found to play important roles in diseases such as cancer [2], [3], [4] and multiple sclerosis [5], to name a few examples. Prior studies on ncRNA classification have attempted to utilize such structural features

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.