Abstract

MicroRNAs are conserved, endogenous small RNAs with critical post-transcriptional regulatory functions throughout eukaryota, including prominent roles in development and disease. Despite much effort, microRNA annotations still contain errors and are incomplete due especially to challenges related to identifying valid miRs that have small numbers of reads, to properly locating hairpin precursors and to balancing precision and recall. Here, we present miRWoods, which solves these challenges using a duplex-focused precursor detection method and stacked random forests with specialized layers to detect mature and precursor microRNAs, and has been tuned to optimize the harmonic mean of precision and recall. We trained and tuned our discovery pipeline on data sets from the well-annotated human genome, and evaluated its performance on data from mouse. Compared to existing approaches, miRWoods better identifies precursor spans, and can balance sensitivity and specificity for an overall greater prediction accuracy, recalling an average of 10% more annotated microRNAs, and correctly predicts substantially more microRNAs with only one read. We apply this method to the under-annotated genomes of Felis catus (domestic cat) and Bos taurus (cow). We identified hundreds of novel microRNAs in small RNA sequencing data sets from muscle and skin from cat, from 10 tissues from cow and also from human and mouse cells. Our novel predictions include a microRNA in an intron of tyrosine kinase 2 (TYK2) that is present in both cat and cow, as well as a family of mirtrons with two instances in the human genome. Our predictions support a more expanded miR-2284 family in the bovine genome, a larger mir-548 family in the human genome, and a larger let-7 family in the feline genome.

Highlights

  • MicroRNAs are a highly-conserved class of small endogenous RNA molecules that are involved in post-transcriptional gene silencing by acting as a guide RNA for the RNA-induced silencing complex (RISC)

  • While the computational prediction of microRNA loci from high-throughput sequence data is well-studied, challenges persist in defining the minimum number of reads required for a locus to be evaluated, as well as in defining the precursor span

  • We present a new method, “miRWoods”, which has greater recall of known microRNAs, while achieving as good or better overall performance

Read more

Summary

Introduction

MicroRNAs (miRNAs, miRs) are a highly-conserved class of small endogenous RNA molecules that are involved in post-transcriptional gene silencing by acting as a guide RNA for the RNA-induced silencing complex (RISC). The biogenesis of microRNAs begins with the generation of a primary transcript (pre-miR), which folds into a structure containing one or more ~70-nt hairpins. These hairpin precursors (pre-miRs) are cut at the base by Drosha [1]. The resultant double-stranded RNA duplex is unwound to produce two mature ~22-nt microRNAs (miRs), named 50 and 30 after the arm of the hairpin from which they derive. The seed sequence at positions 2–8 of RISC-bound mature microRNAs binds to complementary sequences in the 30 untranslated regions (UTRs) of mRNAs

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call