Abstract

BackgroundUnderstanding the biological roles of microRNAs (miRNAs) is a an active area of research that has produced a surge of publications in PubMed, particularly in cancer research. Along with this increasing interest, many open-source bioinformatics tools to identify existing and/or discover novel miRNAs in next-generation sequencing (NGS) reads become available. While miRNA identification and discovery tools are significantly improved, the development of miRNA differential expression analysis tools, especially in temporal studies, remains substantially challenging. Further, the installation of currently available software is non-trivial and steps of testing with example datasets, trying with one’s own dataset, and interpreting the results require notable expertise and time. Subsequently, there is a strong need for a tool that allows scientists to normalize raw data, perform statistical analyses, and provide intuitive results without having to invest significant efforts.FindingsWe have developed miRNA Temporal Analyzer (mirnaTA), a bioinformatics package to identify differentially expressed miRNAs in temporal studies. mirnaTA is written in Perl and R (Version 2.13.0 or later) and can be run across multiple platforms, such as Linux, Mac and Windows. In the current version, mirnaTA requires users to provide a simple, tab-delimited, matrix file containing miRNA name and count data from a minimum of two to a maximum of 20 time points and three replicates. To recalibrate data and remove technical variability, raw data is normalized using Normal Quantile Transformation (NQT), and linear regression model is used to locate any miRNAs which are differentially expressed in a linear pattern. Subsequently, remaining miRNAs which do not fit a linear model are further analyzed in two different non-linear methods 1) cumulative distribution function (CDF) or 2) analysis of variances (ANOVA). After both linear and non-linear analyses are completed, statistically significant miRNAs (P < 0.05) are plotted as heat maps using hierarchical cluster analysis and Euclidean distance matrix computation methods.ConclusionsmirnaTA is an open-source, bioinformatics tool to aid scientists in identifying differentially expressed miRNAs which could be further mined for biological significance. It is expected to provide researchers with a means of interpreting raw data to statistical summaries in a fast and intuitive manner.

Highlights

  • Understanding the biological roles of microRNAs is a an active area of research that has produced a surge of publications in PubMed, in cancer research

  • MicroRNAs are short single-stranded noncoding RNAs approximately 19–22 nucleotide long, which are critical regulators of gene expression and have been implicated in a wide range of physiological processes, such as apoptosis and growth, as well as pathological processes, including inflammatory responses, cancer, neurodegenerative and cardiovascular diseases [1,2,3,4,5,6,7]. This rapid growth is evident by the exponentially increasing number of miRNAs reported in the recent Release 21 (June 2014) of miRBase [8,9] which contains 35,828 mature miRNA products in 223 different species. Accompanying this growth is the development of many miRNA discovery bioinformatics tools including, but not limited to miRscan [10], miRFinder [11], miRDeep [12] and miRanalyzer [13], to help researchers identify miRNAs from existing miRNA databases and/or predict novel miRNAs from next-generation sequencing (NGS) data

  • Conclusions miRNA Temporal Analyzer (mirnaTA) is an open-source bioinformatics tool that can be run in Linux, Mac or Windows with Perl and R package dependencies

Read more

Summary

Introduction

Understanding the biological roles of microRNAs (miRNAs) is a an active area of research that has produced a surge of publications in PubMed, in cancer research. MirnaTA performs a number of distinct steps (Figure 1): (i) normalization of the raw count data into quantiles using Normal Quantile Transformation (NQT), (ii) analysis of NQT data to locate any miRNA species which are either increasing or decreasing linearly using linear regression model, (iii) further analysis of miRNA species that did not fit in linear model by two different methods: (a) normal distribution function known as cumulative distribution function (CDF) if the number of time points is = < 3 or (b) analysis of variances (ANOVA) if the number of time points is >3, (iv) generation of heat maps for any miRNAs that are differentially expressed with statistical significance (P < 0.05), and (v) providing intuitive HTML output formats.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call