Abstract

The exhaustive matching of the protein sequence database makes possible a broadly based study of insertions and deletions (indels) during divergent evolution. In this study, the probability of a gap in an alignment of a pair of homologous protein sequences was found to increase with the evolutionary distance measured in PAM units (number of accepted point mutations per 100 amino acid residues). A relationship between the average number of amino acid residues between indels and evolutionary distance suggests that a unit 30 to 40 amino acid residues in length remains, on average, undisrupted by indels during divergent evolution. Further, the probability of a gap was found to be inversely proportional to gap length raised to the 1·7 power. This empirical law fits closely over the entire range of gap lengths examined. Gap length distribution is largely independent of evolutionary distance. These results rule out the widely used linear gap penalty as a satisfactory formula for scoring gaps when constructing alignments. Further, the observed gap length distribution can be explained by a simple model of selective pressures governing the acceptance of indels during divergent evolution. Finally, this model provides theoretical support for using indels as part of "parsing algorithms", important in the de novo prediction of the folded structure of proteins form the sequence data.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.