Abstract

Prediction of a protein secondary structure gives useful guidance for the prediction of its full three-dimensional structure and function. For successful prediction of the secondary structure from its amino-acid sequence, it is important to analyze the correlation between the sequence and the secondary structure patterns and to extract features from the sequence that play crucial roles in determining the protein structure. As a rst step toward this goal, we try to nd a reduced set of the alphabet for amino acids that contains information relevant for the secondary structure. The amino acids are divided arbitrarily into two groups to obtain two-letter representations of the amino acids. Then, the correlation between patterns of the two-letter sequence and the secondary structure within sliding windows of a given length is measured using mutual information for protein chains collected from a structural database. Through an exhaustive investigation of 2 19 possible two-letter representations, we nd the one with the highest value of mutual information. The procedure of division into two groups is then repeated to nd four-letter representations of the amino acids with maximal sequence-structure correlation. The physical meaning of this automatic grouping is investigated and it is found that the pattern of hydrophobicity and hydrophilicity plays an important role in determining the secondary structure, as well as amino acids with special side chains, such as glycine and proline.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.