Abstract
BackgroundRNA secondary structure prediction is an important issue in structural bioinformatics, and RNA pseudoknotted secondary structure prediction represents an NP-hard problem. Recently, many different machine-learning methods, Markov models, and neural networks have been employed for this problem, with encouraging results regarding their predictive accuracy; however, their performances are usually limited by the requirements of the learning model and over-fitting, which requires use of a fixed number of training features. Because most natural biological sequences have variable lengths, the sequences have to be truncated before the features are employed by the learning model, which not only leads to the loss of information but also destroys biological-sequence integrity.ResultsTo address this problem, we propose an adaptive sequence length based on deep-learning model and integrate an energy-based filter to remove the over-fitting base pairs.ConclusionsComparative experiments conducted on an authoritative dataset RNA STRAND (RNA secondary STRucture and statistical Analysis Database) revealed a 12% higher accuracy relative to three currently used methods.
Highlights
Ribonucleic Acid (RNA) is a carrier of genetic information, and its structure plays a crucial role in gene maturation, regulation, and function [1,2,3]
The dataset of this paper comes from authoritative dataset RNA RNA secondary STRucture and statistical Analysis Database (STRAND) [32], including five subsets: TMR (The tmRNA website [33]),SPR (Sprinzl tRNA Database [34]),SRP (Signal recognition particle database [35]),RFA (The RNA family database [36])and ASE (RNase P Database [37]).There are 2493 sequences in the 5 datasets, the maximum and average length is 553 and 267.37 respectively
Comparison between adaptive-long short-term memory (LSTM) with and without energy-based filter To prove the validity of the energy-based filter, a comparative experiment was carried out on the five datasets
Summary
RNA is a carrier of genetic information, and its structure plays a crucial role in gene maturation, regulation, and function [1,2,3]. For a valid secondary structure, each base, The secondary structure of an RNA molecule represents base-pair interactions that fundamentally determine overall structure [9,10,11]. RNA secondary structure prediction in the absence of pseudoknots has been studied using dynamic programming algorithms described by Zuker [14] and Mathews [15, 16] and employing m-fold [17] and GT-fold [18]. RNA pseudoknotted secondary structure prediction represents an NP-hard optimization problem [19]; in. RNA secondary structure prediction is an important issue in structural bioinformatics, and RNA pseudoknotted secondary structure prediction represents an NP-hard problem. Because most natural biological sequences have variable lengths, the sequences have to be truncated before the features are employed by the learning model, which leads to the loss of information and destroys biological-sequence integrity
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.