Abstract
Approximately half of known human miRNAs are located in the introns of protein coding genes. Some of these intronic miRNAs are only expressed when their host gene is and, as such, their steady state expression levels are highly correlated with those of the host gene's mRNA. Recently host gene expression levels have been used to predict the targets of intronic miRNAs by identifying other mRNAs that they have consistent negative correlation with. This is a potentially powerful approach because it allows a large number of expression profiling studies to be used but needs refinement because mRNAs can be targeted by multiple miRNAs and not all intronic miRNAs are co-expressed with their host genes.Here we introduce InMiR, a new computational method that uses a linear-Gaussian model to predict the targets of intronic miRNAs based on the expression profiles of their host genes across a large number of datasets. Our method recovers nearly twice as many true positives at the same fixed false positive rate as a comparable method that only considers correlations. Through an analysis of 140 Affymetrix datasets from Gene Expression Omnibus, we build a network of 19,926 interactions among 57 intronic miRNAs and 3,864 targets. InMiR can also predict which host genes have expression profiles that are good surrogates for those of their intronic miRNAs. Host genes that InMiR predicts are bad surrogates contain significantly more miRNA target sites in their 3′ UTRs and are significantly more likely to have predicted Pol II and Pol III promoters in their introns.We provide a dataset of 1,935 predicted mRNA targets for 22 intronic miRNAs. These prediction are supported both by sequence features and expression. By combining our results with previous reports, we distinguish three classes of intronic miRNAs: Those that are tightly regulated with their host gene; those that are likely to be expressed from the same promoter but whose host gene is highly regulated by miRNAs; and those likely to have independent promoters.
Highlights
MicroRNAs are a large family of small, non-coding endogenous RNAs that play critical roles in a wide range of normal and diseased-related biological processes [1]–[3] by posttranscriptionally repressing the expression of target genes. miRNAs repress gene expression by binding target mRNAs often in their 39 UTR.MicroRNAs recognize their targets through partially complementary, as such, they are amenable to computational prediction of their target mRNA sequences [4]–[20]
We modeled the change of an mRNA’s expression level in a sample by a linear combination of the host gene expression levels of a subset of the miRNAs with potential target sites in the 39 UTR of the mRNA
Our linear model is as follows: Given N gene expression datasets Di, i~1, . . . N, let Dxig~fDxitggTt~1 denote an T-element vector whose elements correspond to the decrease in the expression level of the gth target gene over T samples in the ith dataset. We model this vector as a linear function of Kg intronic miRNAs whose host gene expression levels are denoted by hikg~fhitkggTt~1,k~1, . . . ,Kg
Summary
MicroRNAs (miRNAs) are a large family of small, non-coding endogenous RNAs that play critical roles in a wide range of normal and diseased-related biological processes [1]–[3] by posttranscriptionally repressing the expression of target genes. miRNAs repress gene expression by binding target mRNAs often in their 39 UTR.MicroRNAs recognize their targets through partially complementary, as such, they are amenable to computational prediction of their target mRNA sequences [4]–[20] (for a recent review of these techniques see [21]). Substantial computational and experimental effort in this area has revealed a number of core predictive sequence features: strong base pairing between the 39 UTR of mRNAs and the miRNA seed region [22], thermodynamic stability of binding sites [23], evolutionary conservation of binding sites ( the seed region) [7], [14], secondary structure accessibility [8], [11], [24]–[26], and dinucleotide composition of flanking sequence [14], [27]. For a comprehensive review of sequence-based features see [29] Despite these efforts, recent reports claim that even the most accurate miRNA target prediction methods have false positive rates greater than 30% [28], [30] and the limited overlap of their predictions suggest that they have high false negative rates [31]–[33]
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have