Abstract
MicroRNAs (miRNAs) are a class of 20–23 nucleotide small RNAs that regulate gene expression post-transcriptionally in animals and plants. Annotation of miRNAs by the miRNA database (miRBase) has largely relied on computational approaches. As a result, many miRBase entries lack experimental validation, and discrepancies between miRBase annotation and actual miRNA sequences are often observed. In this study, we integrated the small RNA sequencing (smRNA-seq) datasets in Caenorhabditis elegans and Drosophila melanogaster and devised an analytical pipeline coupled with detailed manual inspection to curate miRNA annotation systematically in miRBase. Our analysis reveals 19 (17.0%) and 51 (31.3%) miRNAs entries with detectable smRNA-seq reads have mature sequence discrepancies in C. elegans and D. melanogaster, respectively. These discrepancies frequently occur either for conserved miRNA families whose mature sequences were predicted according to their homologous counterparts in other species or for miRNAs whose precursor miRNA (pre-miRNA) hairpins produce an abundance of multiple miRNA isoforms or variants. Our analysis shows that while Drosophila pre-miRNAs, on average, produce less than 60% accurate mature miRNA reads in addition to their 5′ and 3′ variant isoforms, the precision of miRNA processing in C. elegans is much higher, at over 90%. Based on the revised miRNA sequences, we analyzed expression patterns of the more conserved (MC) and less conserved (LC) miRNAs and found that, whereas MC miRNAs are often co-expressed at multiple developmental stages, LC miRNAs tend to be expressed specifically at fewer stages.
Highlights
MicroRNAs are a class of small RNA molecules that mediate post-transcriptional regulation of gene expression by pairing with complementary sites on mRNA transcripts
Based on the mature miRNA sequences annotated in miRNA database (miRBase), the typical sizes of animal and plant miRNAs peak at 22- and 21-nt, respectively, while an equal frequency of 22- and 23-nt miRNAs is observed in C. elegans
Because authentic miRNAs are usually conserved among closely related species, we first classified miRNA families into two groups according to their relative conservation. miRNAs that have homologies outside Hexapoda or Nematoda were termed more conserved (MC), while those extant only in Hexapoda or Nematoda were termed less conserved (LC)
Summary
MicroRNAs (miRNAs) are a class of small RNA molecules that mediate post-transcriptional regulation of gene expression by pairing with complementary sites on mRNA transcripts (reviewed by Carthew and Sontheimer, 2009). During the past 2 years, the total number of registered miRNAs in miRBase has increased from 6,306 in release 11.0 to 14,197 in the current release 15.0 (http://www.mirbase.org). This dramatic expansion of newly discovered miRNAs is largely a benefit of the adoption of nextgeneration high-throughput sequencing technology. There are currently three main sources of miRNA collection: experimentally cloned miRNAs with functional validation collected from the published literature; homologous miRNAs identified from sequence alignment but lacking experimental verification; and miRNAs directly captured by small RNA sequencing (smRNA-seq) platforms. The mature sequence of let-7, a highly conserved miRNA in animal species, is one nucleotide longer in Caenorhabditis elegans than in Drosophila melanogaster (Figure S1 in Supplementary Material)
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have