HuntMi: an efficient and taxon-specific approach in pre-miRNA identification

Adam Gudyś,Michał Wojciech Szcześniak,Izabela Makałowska,Marek Sikora

doi:10.1186/1471-2105-14-83

Adam Gudyś, Michał Wojciech Szcześniak + Show 2 more

Open Access

PDF Available

https://doi.org/10.1186/1471-2105-14-83

Copy DOI

Export

Save

Cite

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

BackgroundMachine learning techniques are known to be a powerful way of distinguishing microRNA hairpins from pseudo hairpins and have been applied in a number of recognised miRNA search tools. However, many current methods based on machine learning suffer from some drawbacks, including not addressing the class imbalance problem properly. It may lead to overlearning the majority class and/or incorrect assessment of classification performance. Moreover, those tools are effective for a narrow range of species, usually the model ones. This study aims at improving performance of miRNA classification procedure, extending its usability and reducing computational time.ResultsWe present HuntMi, a stand-alone machine learning miRNA classification tool. We developed a novel method of dealing with the class imbalance problem called ROC-select, which is based on thresholding score function produced by traditional classifiers. We also introduced new features to the data representation. Several classification algorithms in combination with ROC-select were tested and random forest was selected for the best balance between sensitivity and specificity. Reliable assessment of classification performance is guaranteed by using large, strongly imbalanced, and taxon-specific datasets in 10-fold cross-validation procedure. As a result, HuntMi achieves a considerably better performance than any other miRNA classification tool and can be applied in miRNA search experiments in a wide range of species.ConclusionsOur results indicate that HuntMi represents an effective and flexible tool for identification of new microRNAs in animals, plants and viruses. ROC-select strategy proves to be superior to other methods of dealing with class imbalance problem and can possibly be used in other machine learning classification tasks. The HuntMi software as well as datasets used in the research are freely available at http://lemur.amu.edu.pl/share/HuntMi/.

Highlights

Machine learning techniques are known to be a powerful way of distinguishing microRNA hairpins from pseudo hairpins and have been applied in a number of recognised miRNA search tools
The exception is naïve Bayes for which the gain is moderate. This can be explained by intrinsic resistance of naïve Bayes to the class imbalance problem - it performed well without applying receiver operating characteristics (ROC)-select
It is important to note that variant II overtakes relevantly variant III. This confirms that standard machine learning techniques are not suited for imbalanced datasets and adjusting classifier parameters can reduce the problem of overlearning majority class only by a small marigin

Summary

Introduction

Machine learning techniques are known to be a powerful way of distinguishing microRNA hairpins from pseudo hairpins and have been applied in a number of recognised miRNA search tools. Mature miRNAs are derived from longer precursors called pre-miRNAs that fold into hairpin structures containing one or more mature miRNAs in one or both arms [2] Their biogenesis is highly regulated at both transcriptional and post-transcriptional. Experimental approaches, including direct cloning and Northern blot, are usually able to detect only abundant miRNAs. MicroRNAs that are expressed at very low levels or in a tissue- or stage-specific manner, often remain undetected. MicroRNAs that are expressed at very low levels or in a tissue- or stage-specific manner, often remain undetected These problems are partially addressed by applying the deepsequencing techniques that require extensive computational analyses to distinguish miRNAs from other non-coding RNAs or products of RNA degradation [5]

Objectives

Methods

Results

Conclusion

Full Text

Published Version (Free)

View/Download pdf

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Mar 5, 2013
Citations: 72	License type: CC BY 2.0

R Discovery Prime

HuntMi: an efficient and taxon-specific approach in pre-miRNA identification

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Imbalance Learning and Its Application on Medical Datasets
Yachao Shao
-
Yachao ShaoYachao Shao
21 Feb 2022
21 Feb 2022

Developing an Effective Predictive Model for Clinical Dataset
Naeem Ahmed Mahoto ... Nazish Aslam Brohi
-
Naeem Ahmed Mahoto, et. al.Naeem Ahmed Mahoto ... Nazish Aslam Brohi
01 Nov 2018
01 Nov 2018

Commentary: To classify means to choose a threshold
Jiangnan Lyu ... Hemant Ishwaran
The Journal of Thoracic and Cardiovascular Surgery | VOL. 165
Jiangnan Lyu, et. al.Jiangnan Lyu ... Hemant Ishwaran
08 Aug 2021
The Journal of Thoracic and Cardiovascular Surgery | VOL. 165

HCBST: An Efficient Hybrid Sampling Technique for Class Imbalance Problems
Robert A Sowah ... Raphael A Twum
ACM Transactions on Knowledge Discovery from Data | VOL. 16
Robert A Sowah, et. al.Robert A Sowah ... Raphael A Twum
15 Nov 2021
ACM Transactions on Knowledge Discovery from Data | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

HuntMi: an efficient and taxon-specific approach in pre-miRNA identification

Abstract

Highlights

Summary

Published Version (Free)

Talk to us

Similar Papers

More From: BMC Bioinformatics