Abstract

BackgroundmiRNAs play important roles in the regulation of gene expression. The rapidly developing field of microRNA sequencing (miRNA-seq; small RNA-seq) needs comprehensive, robust, user-friendly and standardized bioinformatics tools to analyze these large datasets. We present miRge 2.0, in which multiple enhancements were made towards these goals.ResultsmiRge 2.0 has become more comprehensive with increased functionality including a novel miRNA detection method, A-to-I editing analysis, integrated standardized GFF3 isomiR reporting, and improved alignment to miRNAs. The novel miRNA detection method uniquely uses both miRNA hairpin sequence structure and composition of isomiRs resulting in higher specificity for potential miRNA identification. Using known miRNA data, our support vector machine (SVM) model predicted miRNAs with an average Matthews correlation coefficient (MCC) of 0.939 over 32 human cell datasets and outperformed miRDeep2 and miRAnalyzer regarding phylogenetic conservation. The A-to-I editing detection strongly correlated with a reference dataset with adjusted R2 = 0.96. miRge 2.0 is the most up-to-date aligner with custom libraries to both miRBase v22 and MirGeneDB v2.0 for 6 species: human, mouse, rat, fruit fly, nematode and zebrafish; and has a tool to create custom libraries. For user-friendliness, miRge 2.0 is incorporated into bcbio-nextgen and implementable through Bioconda.ConclusionsmiRge 2.0 is a redesigned, leading miRNA RNA-seq aligner with several improvements and novel utilities. miRge 2.0 is freely available at: https://github.com/mhalushka/miRge.

Highlights

  • MiRNAs play important roles in the regulation of gene expression

  • In human data, using the miRBase v22 library, miRge 2.0 will align to 2817 miRNAs of which 149 are merged due to a similarity of their sequences

  • Most miRNA alignment tools are agnostic to exact or mismatched alignments, miRge 2.0 sets a threshold of the proportion of canonical reads to all reads for any given miRNAs. This can eliminate over reporting of miRNAs in which too high a percentage of sequences are nontemplated isomiRs, likely from other genomic loci or species contamination. miRge 2.0 provides an optional GFF3 file report, which implements the miRTop guidelines for isomiR reporting utilizing CIGAR values

Read more

Summary

Background

MicroRNAs (miRNAs) are short, single-stranded RNAs that post-transcriptionally regulate gene expression via mRNA decay and/or translational repression [1, 2]. Datasets to model novel miRNA detection Sequencing datasets from 17 tissues in human and mouse (adrenal, bladder, blood, brain prefrontal cortex, colon, epididymis, heart, kidney, liver, lung, pancreas, placenta, retina, skeletal muscle, skin, testes and thyroid) were retrieved from the NCBI Sequence Read Archive (SRA) (Table 1) These samples were processed through miRge 2.0 to identify the different RNA species for machine learning controls. Prediction models for novel miRNA detection We generated measurable features associated with read cluster composition and precursor miRNA structures These features are listed in Additional file 2: Table S1. A consensus standard has been developed by the miRTop consortium utilizing CIGAR values (https:// samtools.github.io/hts-specs/SAMv1.pdf ) This GFF3 formatted output reports on each isomiR sequence and its relationship to the miRNA precursor. Due to a java incompatibility on the workstation, miRAnalyzer was run on a desktop with 4 CPUs (Intel(R) Core(TM) i7– 6700 CPU at 3.40GHz) and 16GB DDR4-RAM

Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call