Abstract

Because proteins that have diverged beyond significant sequence similarity still retain the three-dimensional (3D) fold of their ancestor (Chothia and Lesk, 1986; Rost, 1997), the recognition of structural similarity between proteins provides powerful clues to ancestry. In fact, a large number of distant homology relationships were identified only after the structures of the proteins had been solved (Murzin, 1998). However, structures are being determined only for a small fraction of the proteins. There is a pressing need for improvement in the performance of sequence-based methods for the detection of proteins with the same fold but scant sequence similarity. Here, we examine how to achieve this goal by combining three kinds of information from a protein sequence. First, it has long been recognized that the use of multiplyaligned sequences from a protein family improves the sensitivity of homology detection. This idea is used by many recent computational procedures that exploit evolutionary information to uncover subtle sequence similarity. Examples of such procedures include sequence profiles (Gribskov et al., 1987), consensus templates or motifs (Taylor, 1986; Bairoch, 1991; Tatusov et al., 1994; Yi and Lander, 1994), positionspecific scoring matrices (PSSMs) (Henikoff and Henikoff, 1997), profile hidden Markov models (Eddy, 1998), and intermediate sequence methods (Holm and Sander, 1997; Neuwald et al., 1997; Park et al., 1997). PSI-BLAST (Altschul et al., 1997), one of the most widely used of these procedures, employs an iterative profile search strategy that combines the advantages of both PSSM and intermediate sequence methods. This program has been used effectively by several groups to assign 3D folds to predicted genome products (Teichmann et al., 1999). Second, proteins having the same fold also by definition have very similar secondary structures. In the light of the improved accuracy of secondary structure prediction (Rost and Sander, 1993), several groups have attempted to use sequencederived predictions to improve the sensitivity of fold recognition (Fischer and Eisenberg, 1996; Russel et al., 1996; Di Francesco et al., 1997; Rice and Eisenberg, 1997; Rost et al., 1997). These methods usually represent each protein in a template library by a one-dimensional (1D) string of symbols (profiles) each representing a distinctive 3D structural state, and then use dynamic programming (Needleman and Wunsch, 1970) to align the predicted structural profiles of the query

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.