Abstract

BackgroundRecent studies have revealed the importance of considering the entire distribution of possible secondary structures in RNA secondary structure predictions; therefore, a new type of estimator is proposed including the maximum expected accuracy (MEA) estimator. The MEA-based estimators have been designed to maximize the expected accuracy of the base-pairs and have achieved the highest level of accuracy. Those methods, however, do not give the single best prediction of the structure, but employ parameters to control the trade-off between the sensitivity and the positive predictive value (PPV). It is unclear what parameter value we should use, and even the well-trained default parameter value does not, in general, give the best result in popular accuracy measures to each RNA sequence.ResultsInstead of using the expected values of the popular accuracy measures for RNA secondary structure prediction, which is difficult to be calculated, the pseudo-expected accuracy, which can easily be computed from base-pairing probabilities, is introduced. It is shown that the pseudo-expected accuracy is a good approximation in terms of sensitivity, PPV, MCC, or F-score. The pseudo-expected accuracy can be approximately maximized for each RNA sequence by stochastic sampling. It is also shown that well-balanced secondary structures between sensitivity and PPV can be predicted with a small computational overhead by combining the pseudo-expected accuracy of MCC or F-score with the γ-centroid estimator.ConclusionsThis study gives not only a method for predicting the secondary structure that balances between sensitivity and PPV, but also a general method for approximately maximizing the (pseudo-)expected accuracy with respect to various evaluation measures including MCC and F-score.

Highlights

  • Recent studies have revealed the importance of considering the entire distribution of possible secondary structures in RNA secondary structure predictions; a new type of estimator is proposed including the maximum expected accuracy (MEA) estimator

  • To address the drawbacks of the current MEA-based methods described above, We introduce the pseudo-expected accuracy of a secondary structure with respect to a given accuracy measure, which is a function of the number of true positive base-pairs (TP), truenegative base- pairs (TN), false-positive base-pairs (FP) and false- negative base-pairs (FN)

  • We employed SEN, positive predictive value (PPV), Matthews correlation coefficient (MCC) and F-score with respect to the base-pairs, which are defined by Eqs. (5), (6), (7) and (8), respectively, where s is a predicted structure and θ is a reference structure

Read more

Summary

Introduction

Recent studies have revealed the importance of considering the entire distribution of possible secondary structures in RNA secondary structure predictions; a new type of estimator is proposed including the maximum expected accuracy (MEA) estimator. The MEA-based estimators have been designed to maximize the expected accuracy of the base-pairs and have achieved the highest level of accuracy. Those methods, do not give the single best prediction of the structure, but employ parameters to control the trade-off between the sensitivity and the positive predictive value (PPV). It is unclear what parameter value we should use, and even the well-trained default parameter value does not, in general, give the best result in popular accuracy measures to each RNA sequence. Well-known software (Mfold [13], RNAfold [14] and RNAstructure [15])

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call