Abstract

Traditional machine learning methods are widely used in the field of RNA secondary structure prediction and have achieved good results. However, with the emergence of large-scale data, deep learning methods have more advantages than traditional machine learning methods. As the number of network layers increases in deep learning, there will often be problems such as increased parameters and overfitting. We used two deep learning models, GoogLeNet and TCN, to predict RNA secondary results. And from the perspective of the depth and width of the network, improvements are made based on the neural network model, which can effectively improve the computational efficiency while extracting more feature information. We process the existing real RNA data through experiments, use deep learning models to extract useful features from a large amount of RNA sequence data and structure data, and then predict the extracted features to obtain each base’s pairing probability. The characteristics of RNA secondary structure and dynamic programming methods are used to process the base prediction results, and the structure with the largest sum of the probability of each base pairing is obtained, and this structure will be used as the optimal RNA secondary structure. We, respectively, evaluated GoogLeNet and TCN models based on 5sRNA, tRNA data, and tmRNA data, and compared them with other standard prediction algorithms. The sensitivity and specificity of the GoogLeNet model on the 5sRNA and tRNA data sets are about 16% higher than the best prediction results in other algorithms. The sensitivity and specificity of the GoogLeNet model on the tmRNA dataset are about 9% higher than the best prediction results in other algorithms. As deep learning algorithms’ performance is related to the size of the data set, as the scale of RNA data continues to expand, the prediction accuracy of deep learning methods for RNA secondary structure will continue to improve.

Highlights

  • RNA’s primary structure is a single-stranded sequence of bases randomly composed of bases A, C, G, and U in a specific order.e single-stranded structure of RNA forms the secondary structure of RNA through the principle of complementary base pair pairing. e secondary structure is folded in space to form a complete three-dimensional structure and exhibit its unique functions [1,2,3,4]

  • E secondary structure of RNA can be divided into three parts, namely, the loop, the unpaired single-stranded free structure spiral region, and the spiral region. e loop can be divided into bulge loop, internal loop, and so on. e helical region refers to collecting these base pairs when all bases in two disjoint, equal-length regions are paired in reverse

  • GoogLeNet model and Temporal convolutional network (TCN) model are compared with the experimental results of Mfold and RNAfold on the same test data

Read more

Summary

Introduction

RNA’s primary structure is a single-stranded sequence of bases randomly composed of bases A, C, G, and U in a specific order.e single-stranded structure of RNA forms the secondary structure of RNA through the principle of complementary base pair pairing. e secondary structure is folded in space to form a complete three-dimensional structure and exhibit its unique functions [1,2,3,4]. RNA’s primary structure is a single-stranded sequence of bases randomly composed of bases A, C, G, and U in a specific order. E single-stranded structure of RNA forms the secondary structure of RNA through the principle of complementary base pair pairing. E secondary structure of RNA can be divided into three parts, namely, the loop, the unpaired single-stranded free structure spiral region, and the spiral region. Loop refers to a single-stranded structure in which unpaired bases are bounded by paired base pairs when forming a helical region [8,9,10]. E definition of a pseudoknot is as follows: in a specific RNA sequence, if there are four bases at a, b, c, and d (a < b < c < d), where a matches c and b matches d, the structure formed by (a, c) and (b, d) base pairs is called a pseudoknot structure [11,12,13,14].

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call