Abstract

BackgroundRNA secondary structure prediction is an important research content in the field of biological information. Predicting RNA secondary structure with pseudoknots has been proved to be an NP-hard problem. Traditional machine learning methods can not effectively apply protein sequence information with different sequence lengths to the prediction process due to the constraint of the self model when predicting the RNA secondary structure. In addition, there is a large difference between the number of paired bases and the number of unpaired bases in the RNA sequences, which means the problem of positive and negative sample imbalance is easy to make the model fall into a local optimum. To solve the above problems, this paper proposes a variable-length dynamic bidirectional Gated Recurrent Unit(VLDB GRU) model. The model can accept sequences with different lengths through the introduction of flag vector. The model can also make full use of the base information before and after the predicted base and can avoid losing part of the information due to truncation. Introducing a weight vector to predict the RNA training set by dynamically adjusting each base loss function solves the problem of balanced sample imbalance.ResultsThe algorithm proposed in this paper is compared with the existing algorithms on five representative subsets of the data set RNA STRAND. The experimental results show that the accuracy and Matthews correlation coefficient of the method are improved by 4.7% and 11.4%, respectively.ConclusionsThe flag vector introduced allows the model to effectively use the information before and after the protein sequence; the introduced weight vector solves the problem of unbalanced sample balance. Compared with other algorithms, the LVDB GRU algorithm proposed in this paper has the best detection results.

Highlights

  • RNA secondary structure prediction is an important research content in the field of biological information

  • The variable-length bi-directional recurrent neural network model has better prediction results than the bi-directional recurrent neural network model on five data sets, which shows that the method of introducing a flag vector is more reasonable and scientific than the simple method of sequence length supplement, and makes the learned model more robust and robust

  • It can be seen that the VLDB Variable-length dynamic bidirectional gated recurrent unit (GRU) model performs well on the data set ASE, but the index of each algorithm decreases compared with that on the data set SPR, which is largely related to the characteristics of the ASE data set itself, because the proteins on the ASE data set are RNase P proteins, i.e., ribonuclease P, which contain a large number of bases, making the prediction of its secondary structure more difficult

Read more

Summary

Introduction

RNA secondary structure prediction is an important research content in the field of biological information. Predicting RNA secondary structure with pseudoknots has been proved to be an NP-hard problem. Traditional machine learning methods can not effectively apply protein sequence information with different sequence lengths to the prediction process due to the constraint of the self model when predicting the RNA secondary structure. There is a large difference between the number of paired bases and the number of unpaired bases in the RNA sequences, which means the problem of positive and negative sample imbalance is easy to make the model fall into a local optimum. Introducing a weight vector to predict the RNA training set by dynamically adjusting each base loss function solves the problem of balanced sample imbalance. RNA molecules have the characteristics of difficult crystallization and fast degradation, the traditional methods of X-ray crystal diffraction and nuclear magnetic resonance to determine the secondary structure are timeconsuming, costly and expensive, and not suitable for all RNA molecules [4]

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call