Abstract

Non-coding RNAs (ncRNAs) are RNA molecules that do not code for protein, but take part in biological processes, including gene expression. Interestingly, like proteins, they can fold into complex structures to perform their wide array of biological functions. Since the folded structure of a ncRNA may be critical to its function, many studies have attempted to exploit structural data to infer information, often using machine learning techniques. For instance, they have used predicted secondary structures as input features to various machine learning techniques, in order to classify RNA sequences. However, it is known that a strand of RNA can fold into more than one possible structure, and some strands even form different structures in vivo and in vitro. Furthermore, ncRNAs often function as RNA-protein complexes, which can affect structure. We therefore hypothesized that using a single predicted secondary structure for a single sequence may discard important information, which may result in poorer classification accuracy. To investigate this claim, we propose the use of secondary structure fingerprints as features for machine learning applications, and report on a preliminary evaluation of this approach. The fingerprints comprise two categories: a higher-level (topological) representation derived from RNA-As-Graphs (RAG), and free energy fingerprints based on a novel curated repertoire of small RNA motifs. We have also evaluated our deep learning architecture with k-mers as features, alone and combined with secondary structure fingerprints; to see whether secondary structures or nucleotide composition is more useful in RNA classification, and whether or not both feature types complement each other well. The dataset, trained models, and supplemental material of this study are available at https://www.site.uottawa.ca/turcotte/bibm2020.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.