Abstract

Many works have reported that protein folding rates are influenced by the characteristics of amino acid sequences and protein structures. However, few reports on the problem of whether the corresponding mRNA sequences are related to the protein folding rates can be found. An mRNA sequence is regarded as a kind of genetic language, and its vocabulary and phraseology must provide influential information regarding the protein folding rate. In the present work, linear regressions on the parameters of the vocabulary and phraseology of mRNA sequences and the corresponding protein folding rates were analyzed. The results indicated that D2 (the adjacent base-related information redundancy) values and the GC content values of the corresponding mRNA sequences exhibit significant negative relations with the protein folding rates, but D1 (the single base information redundancy) values exhibit significant positive relations with the protein folding rates. In addition, the results show that the relationships between the parameters of the genetic language and the corresponding protein folding rates are obviously different for different protein groups. Some useful parameters that are related to protein folding rates were found. The results indicate that when predicting protein folding rates, the information from protein structures and their amino acid sequences is insufficient, and some information for regulating the protein folding rates must be derived from the mRNA sequences.

Highlights

  • Proteins cannot function properly if they do not fold into their individual structures, and inactive proteins may be produced by misfolding (Price et al, 2018; Wangeline and Hampton, 2018; Jo et al, 2019)

  • We selected the GC content of mRNA sequences, D1and D2, which represent the information regarding the genetic language of the mRNA sequence to analyze the relations between mRNA sequence and protein folding rate

  • Where CGC is the GC content of an mRNA sequence, NG and NC are the amounts of base G and base C, respectively, and N is the total base number of the mRNA sequence

Read more

Summary

Introduction

Proteins cannot function properly if they do not fold into their individual structures, and inactive proteins may be produced by misfolding (Price et al, 2018; Wangeline and Hampton, 2018; Jo et al, 2019). Since 1998, many studies (Plaxco et al, 1998; Mirny and Shakhnovich, 2001; Zhou and Zhou, 2002; Gong et al, 2003; Kuznetsov and Rackovsky, 2004; Punta and Rost, 2005; Choi, 2020; Li et al, 2020,b) have shown that protein folding rates are related to the corresponding protein structures. There have been some investigations regarding the prediction of protein folding rates based on amino acid sequences, demonstrating that a protein folding rate depends substantially on the corresponding amino acid sequence (Ivankov and Finkelstein, 2004; Gromiha, 2005; Gromiha et al, 2006; Ouyang and Liang, 2008; Razban, 2019; Szczepaniak et al, 2019)

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call