Abstract

MicroRNA (miRNA) are short, non-coding RNAs involved in cell regulation at post-transcriptional and translational levels. Numerous computational predictors of miRNA been developed that generally classify miRNA based on either sequence- or expression-based features. While these methods are highly effective, they require large labelled training data sets, which are often not available for many species. Simultaneously, emerging high-throughput wet-lab experimental procedures are producing large unlabelled data sets of genomic sequence and RNA expression profiles. Existing methods use supervised machine learning and are therefore unable to leverage these unlabelled data. In this paper, we design and develop a multi-view co-training approach for the classification of miRNA to maximize the utility of unlabelled training data by taking advantage of multiple views of the problem. Starting with only 10 labelled training data, co-training is shown to significantly (p < 0.01) increase classification accuracy of both sequence- and expression-based classifiers, without requiring any new labelled training data. After 11 iterations of co-training, the expression-based view of miRNA classification experiences an average increase in AUPRC of 15.81% over six species, compared to 11.90% for self-training and 4.84% for passive learning. Similar results are observed for sequence-based classifiers with increases of 46.47%, 39.53% and 29.43%, for co-training, self-training, and passive learning, respectively. The final co-trained sequence and expression-based classifiers are integrated into a final confidence-based classifier which shows improved performance compared to both the expression (1.5%, p = 0.021) and sequence (3.7%, p = 0.006) views. This study represents the first application of multi-view co-training to miRNA prediction and shows great promise, particularly for understudied species with few available training data.

Highlights

  • MicroRNA are involved in cell regulation at the post-transcriptional and translational levels through the degradation and translation inhibition of messenger RNA

  • The application of 11 iterations of co-training to the chicken data set resulted in the expression and sequence-based classifier’s average AUPRC increasing by 4.12% and 3.80%, respectively

  • We propose a novel multi-view co-training approach for the classification of miRNA

Read more

Summary

Introduction

MicroRNA (miRNA) are involved in cell regulation at the post-transcriptional and translational levels through the degradation and translation inhibition of messenger RNA (mRNA). The majority of current de novo and NGS-based miRNA prediction techniques use supervised learning methods for the detection of novel miRNA, thereby requiring a large database of known miRNA. These methods do not always achieve high accuracy, resulting in many sequences being falsely predicted to be miRNA. We aim to minimize the number of labelled training exemplars required, and to improve the performance of current prediction methods using fewer samples In this way, we can create a more efficient method for the identification of novel miRNA, for species with few known miRNA, while maximizing the return on investment for costly wet-lab validation experiments. Semi-supervised machine learning methods make use of both labelled and unlabelled data for classification; such methods are designed to work in situations where we have a small number of known exemplars and a large body of unlabelled data

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.