Abstract

In silico generated search for microRNAs (miRNAs) has been driven by methods compiling structural features of the miRNA precursor hairpin, as well as to some degree combining this with the analysis of RNA-seq profiles for which the miRNA typically leave the drosha/dicer fingerprint of 1–2 ~22 nt blocks of reads corresponding to the mature and star miRNA. In complement to the previous methods, we present a study where we systematically exploit these patterns of read profiles. We created two datasets comprised of 2540 and 4795 read profiles obtained after preprocessing short RNA-seq data from miRBase and ENCODE, respectively. Out of 4795 ENCODE read profiles, 1361 are annotated as non-coding RNAs (ncRNAs) and of which 285 are further annotated as miRNAs. Using deepBlockAlign (dba), we align ncRNA read profiles from ENCODE against the miRBase read profiles (cleaned for “self-matches”) and are able to separate ENCODE miRNAs from the other ncRNAs by a Matthews Correlation Coefficient (MCC) of 0.8 and obtain an area under the curve of 0.93. Based on the dba score cut-off of 0.7 at which we observed the maximum MCC of 0.8, we predict 523 novel miRNA candidates. An additional RNA secondary structure analysis reveal that 42 of the candidates overlap with predicted conserved secondary structure. Further analysis reveal that the 523 miRNA candidates are located in genomic regions with MAF block (UCSC) fragmentation and poor sequence conservation, which in part might explain why they have been overlooked in previous efforts. We further analyzed known human and mouse miRNA read profiles and found two distinct classes; the first containing two blocks and the second containing >2 blocks of reads. Also the latter class holds read profiles that have less well defined arrangement of reads in comparison to the former class. On comparison of miRNA read profiles from plants and animals, we observed kingdom specific read profiles that are distinct in terms of both length and distribution of reads within the read profiles to each other. All the data, as well as a server to search miRBase read profiles by uploading a BED file, is available at http://rth.dk/resources/mirdba.

Highlights

  • MicroRNAs are small, non-coding RNAs 18–24 nucleotides in length that play important roles in various biological and metabolic processes, including signal transduction, developmental timing, cell maintenance and differentiation (Zhang et al, 2006b)

  • ROC curve analysis using R package ROCR (Sing et al, 2005) showed a high AUC of 0.93 suggesting that miRNA read profiles have characteristic features that are distinct from read profiles of other non-coding RNAs (ncRNAs) and can be employed for confident prediction of miRNA. (Figure 4B)

  • A high AUC of 0.93 was observed suggesting that miRNA read profiles have characteristic features that are distinct from read profiles of other ncRNAs and can be employed for confident miRNA prediction

Read more

Summary

Introduction

MicroRNAs (miRNAs) are small, non-coding RNAs 18–24 nucleotides in length that play important roles in various biological and metabolic processes, including signal transduction, developmental timing, cell maintenance and differentiation (Zhang et al, 2006b). Many in-silico based approaches have been developed based on major characteristic of miRNAs for example hairpin-shaped stem loop structure integrated with homology search (Wang et al, 2005; Dezulian et al, 2006) or evolutionary conservation (Lai et al, 2003; Lim et al, 2003). Methods based on phylogenetic shadowing (Berezikov et al, 2005), neighbor step loop search (Ohler et al, 2004), minimal folding free energy index (Zhang et al, 2006a) and machine learning approaches have been developed (Table 1). Various plant and animal miRNAs have been identified using these computational approaches. Many of these methods have sensitivity problems and give a number of false positive results (Bentwich, 2005). Taken together all search methods aim to reduce the search space in their own respective ways (Lindow and Gorodkin, 2007)

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.