Abstract

MicroRNAs (miRNAs) play important roles in post-transcriptional gene regulation and phenotype development. Understanding the regulation of miRNA genes is critical to understand gene regulation. One of the challenges to study miRNA gene regulation is the lack of condition-specific annotation of miRNA transcription start sites (TSSs). Unlike protein-coding genes, miRNA TSSs can be tens of thousands of nucleotides away from the precursor miRNAs and they are hard to be detected by conventional RNA-Seq experiments. A number of studies have been attempted to computationally predict miRNA TSSs. However, high-resolution condition-specific miRNA TSS prediction remains a challenging problem. Recently, deep learning models have been successfully applied to various bioinformatics problems but have not been effectively created for condition-specific miRNA TSS prediction. Here we created a two-stream deep learning model called D-miRT for computational prediction of condition-specific miRNA TSSs (http://hulab.ucf.edu/research/projects/DmiRT/). D-miRT is a natural fit for the integration of low-resolution epigenetic features (DNase-Seq and histone modification data) and high-resolution sequence features. Compared with alternative computational models on different sets of training data, D-miRT outperformed all baseline models and demonstrated high accuracy for condition-specific miRNA TSS prediction tasks. Comparing with the most recent approaches on cell-specific miRNA TSS identification using cell lines that were unseen to the model training processes, D-miRT also showed superior performance.

Highlights

  • MicroRNAs are ~ 22 nucleotides long non-coding RNAs

  • Cap Analysis Gene Expression (CAGE) robust Transcription start sites (TSSs) peaks with coverage in transcripts per kilobase million greater than 10, for these seven cell lines were downloaded from the FANTOM5 project

  • When evaluated on the 10% test data for the respective cell lines, which were not used for training, D-miRT models for the seven cell lines showed around 92–96% of precision, recall and F1 scores on average with the default score threshold (Table 1)

Read more

Summary

Introduction

MicroRNAs (miRNAs) are ~ 22 nucleotides long non-coding RNAs. They express ubiquitously in almost all cell types, are evolutionarily conserved in most metazoan and plant species, and can regulate more than 30% of mammalian gene products through complementary binding to the corresponding m­ RNAs1,2. Initial computational approaches for genome-wide TSS identification focused on the use of sequence features such as over-represented k-mers, transcription factor binding site enrichment, conservation, and CpG ­content[9,10,11,12,13] Such sequence-based strategies often lead to a large number of false predictions and are not able to identify condition-specific TSSs. Later studies have shown success in miRNA TSS prediction utilizing active gene transcription markers such as trimethylation of Lys 4 of histone 3 (H3K4me3), acetylation of Lys 9/14 of histone 3(H3K9/14Ac), Polymerase II (Pol II) and DNase-Seq ­measurements[14,15,16,17,18]. By integrating both sequence features and features from high-throughput sequencing datasets, D-miRT has shown the ability to produce accurate, high-resolution, and condition-specific miRNA TSS predictions

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call