Abstract

RNA plays important roles in almost every aspect of biology, and every aspect of RNA biology is influenced by its folding. This is a particularly important consideration in the era of high-throughput sequencing, when the discovery of novel transcripts far outpaces our knowledge of their functions. To gain a comprehensive picture of biology requires a structural framework for making functional inferences on RNA. To this end we have developed the RNA Structurome Database (https://structurome.bb.iastate.edu), a comprehensive repository of RNA secondary structural information that spans the entire human genome. Here, we compile folding information for every base pair of the genome that may be transcribed: coding, noncoding, and intergenic regions, as well as repetitive elements, telomeres, etc. This was done by fragmenting the GRCh38 reference genome into 154,414,320 overlapping sequence fragments and, for each fragment, calculating a set of metrics based on the sequence’s folding properties. These data will facilitate a wide array of investigations: e.g. discovery of structured regulatory elements in differential gene expression data or noncoding RNA discovery, as well as allow genome-scale analyses of RNA folding.

Highlights

  • Once thought to be solely an intermediary between the genome and proteome, RNA is known to be a key player in the biology of all living things

  • Two primary training parameters are used for ncRNA classification: a structure conservation index (SCI), which measures conservation of secondary structure and a thermodynamic z-score, which measures the propensity of a particular sequence to form a defined and energetically stable structure

  • The RNAStructuromeDB holds the results of a genome-wide computational analysis in which we folded the entire human genome

Read more

Summary

Introduction

Once thought to be solely an intermediary between the genome and proteome, RNA is known to be a key player in the biology of all living things (as well as viruses, viroids and transposable elements). Collections of ncRNA sequences are being built into databases such as Rfam[15,23], lncRNAdb[24,25], LNCipedia[26,27], mirBASE28–32 and RNAcentral[33,34,35] These important projects are compiling well-annotated and, in many cases, functionally validated ncRNAs alongside other valuable data. Rfam entries contains information describing ncRNA biosynthesis, localization, phylogenetic distribution and functional roles, as well as evolutionary conservation of primary sequence and, importantly, secondary structure. The identification of so many deeply conserved structured RNAs highlights their likely ubiquity and importance In both coding and noncoding RNAs, secondary structure plays key roles throughout their functions. Dynamic RNA structure has significance to disease: single nucleotide polymorphisms (SNPs) can affect RNA folding in ways that impede healthy function by disrupting specific motifs or altering conformational equilibria[51,52]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call