Abstract
MicroRNAs are a class of small non-coding RNAs that play an important role in post-transcriptional regulation of gene products. Identification of novel microRNA is difficult because the validated microRNA set is still small in size and diverse. Existing feature selection methods use different combinations of features related to the biogenesis of microRNAs, but performance evaluations are not comprehensive. We developed a robust feature selection method using a combination of three types of nucleotide-structure triplets, the minimum free energy of the secondary structure of precursor microRNAs and other extracted characteristics. We compared our new combination feature set and three other previously published sets using three different classifiers: logistic regression, support vector machine, and random forest. Our proposed feature set was not only robust across all classifier methods, but also had the highest classification performance, as measured by the area under the ROC curve.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have