Abstract

Initial protein structural comparisons were sequence-based. Since amino acids that are distant in the sequence can be close in the 3-dimensional (3D) structure, 3D contact approaches can complement sequence approaches. Traditional 3D contact approaches study 3D structures directly and are alignment-based. Instead, 3D structures can be modeled as protein structure networks (PSNs). Then, network approaches can compare proteins by comparing their PSNs. These can be alignment-based or alignment-free. We focus on the latter. Existing network alignment-free approaches have drawbacks: 1) They rely on naive measures of network topology. 2) They are not robust to PSN size. They cannot integrate 3) multiple PSN measures or 4) PSN data with sequence data, although this could improve comparison because the different data types capture complementary aspects of the protein structure. We address this by: 1) exploiting well-established graphlet measures via a new network alignment-free approach, 2) introducing normalized graphlet measures to remove the bias of PSN size, 3) allowing for integrating multiple PSN measures, and 4) using ordered graphlets to combine the complementary PSN data and sequence (specifically, residue order) data. We compare synthetic networks and real-world PSNs more accurately and faster than existing network (alignment-free and alignment-based), 3D contact, or sequence approaches.

Highlights

  • Our results reveal that ordered graphlet counting (OrderedGraphlet-3-4) is 5.5 times slower than regular graphlet counting (Graphlet-3-4)

  • By normalizing the graphlet measures, we improve upon our non-normalized graphlet measures (Fig. 6 and Supplementary Tables S11 and S12)

  • We aim to identify graphlet patterns that lead to successful distinction of different CATH or SCOP label categories from the PSN data, focusing as an illustration on the PSN sets containing networks of the same size (CATH95, CATH-99, and CATH-251-265) from α or β protein domain labels

Read more

Summary

Objectives

We aim to analyze a synthetic network set in which all networks are of the same size but have different labels. We aim to analyze networks of different sizes and different labels, to check whether an approach can correctly identify: 1) as similar networks from the same model despite the networks being of different sizes, and 2) as dissimilar networks from different models despite the networks being of the same size. We aim to identify graphlet patterns that lead to successful distinction of different CATH or SCOP label categories from the PSN data, focusing as an illustration on the PSN sets containing networks of the same size (CATH95, CATH-99, and CATH-251-265) from α or β protein domain labels

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call