Abstract
BackgroundThe Structural Classification of Proteins (SCOP) database uses a large number of hidden Markov models (HMMs) to represent families and superfamilies composed of proteins that presumably share the same evolutionary origin. However, how the HMMs are related to one another has not been examined before.ResultsIn this work, taking into account the processes used to build the HMMs, we propose a working hypothesis to examine the relationships between HMMs and the families and superfamilies that they represent. Specifically, we perform an all-against-all HMM comparison using the HHsearch program (similar to BLAST) and construct a network where the nodes are HMMs and the edges connect similar HMMs. We hypothesize that the HMMs in a connected component belong to the same family or superfamily more often than expected under a random network connection model. Results show a pattern consistent with this working hypothesis. Moreover, the HMM network possesses features distinctly different from the previously documented biological networks, exemplified by the exceptionally high clustering coefficient and the large number of connected components.ConclusionsThe current finding may provide guidance in devising computational methods to reduce the degree of overlaps between the HMMs representing the same superfamilies, which may in turn enable more efficient large-scale sequence searches against the database of HMMs.
Highlights
The Structural Classification of Proteins (SCOP) database uses a large number of hidden Markov models (HMMs) to represent families and superfamilies composed of proteins that presumably share the same evolutionary origin
At the lowest level, individual proteins are clustered into families based on some criteria that may indicate their common evolutionary origin, such as having a pairwise sequence similarity of more than 30% or lower sequence similarity but similar functions and structures
In order to see how the superfamilies are represented in terms of connected components, we examined the number of HMMs representing the 1163 superfamilies to see how many CCs the HMMs are dispersed into
Summary
The Structural Classification of Proteins (SCOP) database uses a large number of hidden Markov models (HMMs) to represent families and superfamilies composed of proteins that presumably share the same evolutionary origin. The Structural Classification of Proteins (SCOP) database is a comprehensive protein database that organizes and classifies proteins based on their evolutionary and structural relationships [1,2,3]. It is organized into four hierarchical levels: family, superfamily, fold, and classes. At the lowest level (family), individual proteins are clustered into families based on some criteria that may indicate their common evolutionary origin, such as having a pairwise sequence similarity of more than 30% or lower sequence similarity but similar functions and structures. Superfamilies are clustered into folds if superfamilies share major secondary structures with the same topological arrangements
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.