Abstract

BackgroundOne approach for speeding-up protein structure comparison is the projection approach, where a protein structure is mapped to a high-dimensional vector and structural similarity is approximated by distance between the corresponding vectors. Structural footprinting methods are projection methods that employ the same general technique to produce the mapping: first select a representative set of structural fragments as models and then map a protein structure to a vector in which each dimension corresponds to a particular model and "counts" the number of times the model appears in the structure. The main difference between any two structural footprinting methods is in the set of models they use; in fact a large number of methods can be generated by varying the type of structural fragments used and the amount of detail in their representation. How do these choices affect the ability of the method to detect various types of structural similarity?ResultsTo answer this question we benchmarked three structural footprinting methods that vary significantly in their selection of models against the CATH database. In the first set of experiments we compared the methods' ability to detect structural similarity characteristic of evolutionarily related structures, i.e., structures within the same CATH superfamily. In the second set of experiments we tested the methods' agreement with the boundaries imposed by classification groups at the Class, Architecture, and Fold levels of the CATH hierarchy.ConclusionIn both experiments we found that the method which uses secondary structure information has the best performance on average, but no one method performs consistently the best across all groups at a given classification level. We also found that combining the methods' outputs significantly improves the performance. Moreover, our new techniques to measure and visualize the methods' agreement with the CATH hierarchy, including the threshholded affinity graph, are useful beyond this work. In particular, they can be used to expose a similar composition of different classification groups in terms of structural fragments used by the method and thus provide an alternative demonstration of the continuous nature of the protein structure universe.

Highlights

  • One approach for speeding-up protein structure comparison is the projection approach, where a protein structure is mapped to a high-dimensional vector and structural similarity is approximated by distance between the corresponding vectors

  • It was shown [1] that once vector representations are computed it takes on average 500 seconds for a projection method to perform all pairwise comparisons among 5,024 domains. (Compare this to an estimated four months it would take DALI [2], a highly accurate protein structure comparison method, to perform the same number of pairwise comparisons.) the advantage of the projection approach is one of its main limitations; namely, in the process of mapping, some structural information is lost

  • An agreement value close to one means that from the method's perspective the corresponding classification groups are structurally isolated from other groups, i.e., the composition of its member protein domains in terms of structural fragments used by the method to model the structure is quite different from that of domains in other groups

Read more

Summary

Introduction

One approach for speeding-up protein structure comparison is the projection approach, where a protein structure is mapped to a high-dimensional vector and structural similarity is approximated by distance between the corresponding vectors. The main difference between any two structural footprinting methods is in the set of models they use; a large number of methods can be generated by varying the type of structural fragments used and the amount of detail in their representation. How do these choices affect the ability of the method to detect various types of structural similarity?. Once the mapping is done, protein structure comparison is reduced to a distance computation between the corresponding vectors and is very efficient It was shown [1] that once vector representations are computed it takes on average 500 seconds for a projection method to perform all pairwise comparisons among 5,024 domains. There is no agreement on what constitutes a good projection technique, and currently known projection methods [1,3,4,5,6,7] utilize very different approaches to the mapping construction, both in terms of which structural information is included and how this information is integrated to produce a vector representation

Objectives
Methods
Results
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.