Abstract

Most natural protein sequences have resulted from millions or even billions of years of evolution. How they differ from random sequences is not fully understood. Previous computational and experimental studies of random proteins generated from noncoding regions yielded inclusive results due to species-dependent codon biases and GC contents. Here, we approach this problem by investigating 10,000 sequences randomized at the amino acid level. Using well-established predictors for protein intrinsic disorder, we found that natural sequences have more long disordered regions than random sequences, even when random and natural sequences have the same overall composition of amino acid residues. We also showed that random sequences are as structured as natural sequences according to contents and length distributions of predicted secondary structure, although the structures from random sequences may be in a molten globular-like state, according to molecular dynamics simulations. The bias of natural sequences toward more intrinsic disorder suggests that natural sequences are created and evolved to avoid protein aggregation and increase functional diversity.

Highlights

  • Proteins are linear polymeric chains made of a combination of 20 different types of amino acid residues

  • Using well-established predictors for protein intrinsic disorder, we found that natural sequences have more long disordered regions than random sequences, even when random and natural sequences have the same overall composition of amino acid residues

  • We showed that random sequences are as structured as natural sequences according to contents and length distributions of predicted secondary structure, the structures from random sequences may be in a molten globular-like state, according to molecular dynamics simulations

Read more

Summary

Introduction

Proteins are linear polymeric chains made of a combination of 20 different types of amino acid residues. The total number of proteins explored by nature since the origin of life is estimated between 1021 and 1043 [1]. Random co-polymerization of mixed amino-acid N-carboxyanhydrides was shown to produce compact structures similar to proteins [6, 7]. Random sequences of three residue types (Q, R, and L) of 70–90 amino acid residues were expressed in E. coli and shown to have secondary structures and cooperative unfolding [8]. Further studies indicate that random 120-amino-acid sequences of 20 residue types are aggregation-prone, and 12 residue-type sequences have better solubility [9].

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call