Abstract

Position-specific scoring matrices (PSSMs) are useful for detecting weak homology in protein sequence analysis, and they are thought to contain some essential signatures of the protein families. In order to elucidate what kind of ingredients constitute such family-specific signatures, we apply singular value decomposition to a set of PSSMs and examine the properties of dominant right and left singular vectors. The first right singular vectors were correlated with various amino acid indices including relative mutability, amino acid composition in protein interior, hydropathy, or turn propensity, depending on proteins. A significant correlation between the first left singular vector and a measure of site conservation was observed. It is shown that the contribution of the first singular component to the PSSMs act to disfavor potentially but falsely functionally important residues at conserved sites. The second right singular vectors were highly correlated with hydrophobicity scales, and the corresponding left singular vectors with contact numbers of protein structures. It is suggested that sequence alignment with a PSSM is essentially equivalent to threading supplemented with functional information. In addition, singular vectors may be useful for analyzing and annotating the characteristics of conserved sites in protein families.

Highlights

  • Protein sequence alignment using a position-specific scoring matrix (PSSM) or sequence profile [1,2] is a standard tool for sequence analysis[3,4]

  • In order to check to what extent a subset of singular components can explain the original PSSM, we calculated the accumulative contribution ratio of each PSSM

  • Since each element of a right singular vector numerically represents some property of an amino acid type, we can regard a right singular vector of a PSSM as an amino acid index [16,17,8]

Read more

Summary

Introduction

Protein sequence alignment using a position-specific scoring matrix (PSSM) or sequence profile [1,2] is a standard tool for sequence analysis[3,4]. Using a PSSM, it is often possible to detect very distantly related proteins which cannot be detected by the standard pairwise alignment based on a position-independent amino acid substitution matrix (AASM). An AASM is a 20620 real (usually symmetric) matrix each element of which reflects the tendency of substitution between amino acid residues. Tomii and Kanehisa found that the PAM matrices can be well approximated by the volume and hydrophobicity of amino acid residues[8]. A similar result was obtained by Pokarowski et al.[10], but they pointed out the importance of the coil preferences of amino acids residues

Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.