Modular decomposition of protein structure using community detection

William P Grant,Sebastian E Ahnert

doi:10.1093/comnet/cny014

William P Grant, Sebastian E Ahnert

Open Access

https://doi.org/10.1093/comnet/cny014

Copy DOI

Abstract

As the number of solved protein structures increases, the opportunities for meta-analysis of this dataset increase too. Protein structures are known to be formed of domains; structural and functional subunits that are often repeated across sets of proteins. These domains generally form compact, globular regions, and are therefore often easily identifiable by inspection, yet the problem of automatically fragmenting the protein into these compact substructures remains computationally challenging. Existing domain classification methods focus on finding subregions of protein structure that are conserved, rather than finding a decomposition which spans the full protein structure. However, such a decomposition would find ready application in coarse-graining molecular dynamics, analysing the protein’s topology, in de novo protein design and in fitting electron microscopy maps. Here, we present a tool for performing this modular decomposition using the Infomap community detection algorithm. The protein structure is abstracted into a network in which its amino acids are the nodes, and where the edges are generated using a simple proximity test. Infomap can then be used to identify highly intra-connected regions of the protein. We perform this decomposition systematically across 4000 distinct protein structures, taken from the Protein Data Bank. The decomposition obtained correlates well with existing PFAM sequence classifications, but has the advantage of spanning the full protein, with the potential for novel domains. The coarse-grained network formed by the communities can also be used as a proxy for protein topology at the single-chain level; we demonstrate that grouping these proteins by their coarse-grained network results in a functionally significant classification.

Highlights

All proteins are formed of chains of covalently bonded amino acids
We see that a scaling parameter of approximately 4 gives communities corresponding to compact, globular regions of the protein structure (Fig. 3)
We test the correspondence between the known PFAM domains, and the generated community structure

Summary

Introduction

All proteins are formed of chains of covalently bonded amino acids ( known as residues). The pattern of non-covalent bonding between units of the chain is what causes the protein to fold into its compact native structure; specifying the sequence of amino acids in a protein is sufficient to uniquely determine its folded shape [1]. This structure allows the protein to carry out its designated role within the cell. Over 130 000 protein structures are publicly available in the Protein Data Bank (PDB) [2], and the size of this dataset is growing exponentially [3].

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Complex Networks	Publication Date: Aug 8, 2018
Citations: 7	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Modular decomposition of protein structure using community detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Complex Networks

Lead the way for us

Similar Papers

Distinctions between Hydrophobic Helices in Globular Proteins and Transmembrane Segments as Factors in Protein Sorting
Fiona Cunningham ... Charles M Deber
Journal of Biological Chemistry | VOL. 284
Fiona Cunningham, et. al.Fiona Cunningham ... Charles M Deber
01 Feb 2009
Journal of Biological Chemistry | VOL. 284

Coarse-Grained MD Simulations of Membrane Protein-Bilayer Self-Assembly
Kathryn A Scott ... Anthony Ivetac
Structure | VOL. 16
Kathryn A Scott, et. al.Kathryn A Scott ... Anthony Ivetac
01 Apr 2008
Structure | VOL. 16

PDB40: The Protein Data Bank celebrates its 40th birthday
Stephen K Burley
Biopolymers | VOL. 99
Stephen K BurleyStephen K Burley
21 Dec 2012
PDB40: The Protein Data Bank celebrates its 40th birthday
Stephen K Burley

PSAC-PDB: Analysis and classification of protein structures
M Saqib Nawaz ... Qin Zhang
Computers in Biology and Medicine | VOL. 158
M Saqib Nawaz, et. al.M Saqib Nawaz ... Qin Zhang
22 Mar 2023
Computers in Biology and Medicine | VOL. 158

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Modular decomposition of protein structure using community detection

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Journal of Complex Networks