Abstract

Protein structure and function information is coded in amino acid sequences. However, the relationship between primary sequences and three-dimensional structures and functions remains enigmatic. Our approach to this fundamental biochemistry problem is based on the frequencies of short constituent sequences (SCSs) or words. A protein amino acid sequence is considered analogous to an English sentence, where SCSs are equivalent to words. Availability scores, which are defined as real SCS frequencies in the non-redundant amino acid database relative to their probabilistically expected frequencies, demonstrate the biological usage bias of SCSs. As a result, this frequency-based linguistic approach is expected to have diverse applications, such as secondary structure specifications by structure-specific SCSs and immunological adjuvants with rare or non-existent SCSs. Linguistic similarities (e.g., wide ranges of scale-free distributions) and dissimilarities (e.g., behaviors of low-rank samples) between proteins and the natural English language have been revealed in the rank-frequency relationships of SCSs or words. We have developed a web server, the SCS Package, which contains five applications for analyzing protein sequences based on the linguistic concept. These tools have the potential to assist researchers in deciphering structurally and functionally important protein sites, species-specific sequences, and functional relationships between SCSs. The SCS Package also provides researchers with a tool to construct amino acid sequences de novo based on the idiomatic usage of SCSs.

Highlights

  • In the mid-20th century, molecular biology revolutionized functional information about a protein is coded in that protein’s biological sciences through the discovery of the molecular information amino acid sequences and nowhere else

  • A protein amino acid sequence is considered analogous to an English sentence, where short constituent sequences (SCSs) are equivalent to words

  • We have developed a web server, the SCS Package, which contains five applications for analyzing protein sequences based on the linguistic concept

Read more

Summary

Introduction

In the mid-20th century, molecular biology revolutionized functional information about a protein is coded in that protein’s biological sciences through the discovery of the molecular information amino acid sequences and nowhere else. A protein amino acid sequence is considered analogous to an English sentence, where SCSs are equivalent to words. There is no doubt that alignment-based programs are very have very different amino acid sequences, their folded structures may powerful tools for examining relationships among proteins with be similar.

Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call