Abstract

The free energy (evaluation) models used in RNA secondary structure prediction are one of the most important reasons that makes the prediction a challenging computational problem in Bioinformatics. These models are the key factor determining the accuracy of the prediction algorithms. Previously we have developed a method called GAknot that has obtained good performance on predicting RNA secondary structures with pseudoknots. In this paper, we propose a new free energy model. We first select a number of RNA sequences from a database which contains known RNA secondary structures as a training dataset for learning this new model. From the training dataset, we then extract base pairs patterns in subsequences of pairs of k-mers from the stems of each sequence in the training data and use the patterns to formulate penalty factors. We modify the energy model by adding these penalty factors. Combined with the new modified energy model, the prediction performance of GAknot has been improved significantly. GAknot with the new modified energy model is shown to be the best method in comparison with two state-of-the-art algorithms using a commonly used testing dataset. The penalty factors of the new energy model and dataset can be downloaded at http://appsrv.cse.cuhk.edu.hk/~kktong/NewModel.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.