Abstract

This work presents an identification tool for plant precursor miRNAs (pre-miRNAs) using structural robustness and derivative features which can improve performance in discriminating the plant pre-miRNAs from pseudo pre-miRNAs. The classification models were trained with plant pre-miRNAs and pseudo hairpins datasets from PlantMiRNAPred web site. The top 20 features were selected from four groups of features including sequence-based features, secondary structure features, base-pair features and a self-containment index score. In particular, the self-containment index score was found to be the highest informative feature among the 20 selected features. Ten-fold cross validation was applied to choose a classifier algorithm with the highest performance being among Support Vector Machine, Random Forest, Decision Tree, Naive Bayes, K-nearest neighbor, Back-propagation Neural Network, Ripper and RBF network based on the ROC area and accuracy. The results demonstrated that the Random Forest model using 20 selected features achieved 97% accuracy and 94% sensitivity in test sets to discriminating real plant pre-miRNAs from others.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call