Atomic Motif Recognition in (Bio)Polymers: Benchmarks From the Protein Data Bank.

Benjamin A Helfrecht,Piero Gasparotto,Michele Ceriotti,Federico Giberti

doi:10.3389/fmolb.2019.00024

Benjamin A Helfrecht, Piero Gasparotto + Show 2 more

Open Access

https://doi.org/10.3389/fmolb.2019.00024

Copy DOI

Abstract

Rationalizing the structure and structure–property relations for complex materials such as polymers or biomolecules relies heavily on the identification of local atomic motifs, e.g., hydrogen bonds and secondary structure patterns, that are seen as building blocks of more complex supramolecular and mesoscopic structures. Over the past few decades, several automated procedures have been developed to identify these motifs in proteins given the atomic structure. Being based on a very precise understanding of the specific interactions, these heuristic criteria formulate the question in a way that implies the answer, by defining a list of motifs based on those that are known to be naturally occurring. This makes them less likely to identify unexpected phenomena, such as the occurrence of recurrent motifs in disordered segments of proteins, and less suitable to be applied to different polymers whose structure is not driven by hydrogen bonds, or even to polypeptides when appearing in unusual, non-biological conditions. Here we discuss how unsupervised machine learning schemes can be used to recognize patterns based exclusively on the frequency with which different motifs occur, taking high-resolution structures from the Protein Data Bank as benchmarks. We first discuss the application of a density-based motif recognition scheme in combination with traditional representations of protein structure (namely, interatomic distances and backbone dihedrals). Then, we proceed one step further toward an entirely unbiased scheme by using as input a structural representation based on the atomic density and by employing supervised classification to objectively assess the role played by the representation in determining the nature of atomic-scale patterns.

Highlights

Macromolecules are characterized by their capability of folding and assembling into hierarchical structures, which is a crucial element in their activity and stability
The analysis protocols that we have discussed above identify the presence of significant motifs based exclusively on how often a given local atomistic environment occurs in a reference dataset. While this procedure makes it possible to rely on simple and rather generic descriptors of local structure, it still requires a dose of chemical intuition, i.e., it is necessary to know the basis of hydrogen bonding and that dihedral angles can be used to identify the secondary structure of a protein
Given that the Smooth Overlap of Atomic Positions (SOAP) representation can be tuned to encompass environments of different sizes and provide a complete description of the correlation between atomic positions, it gives us an opportunity to verify whether any discrepancy between the Probabilistic Analysis of Molecular Motifs (PAMM) classification and the reference heuristics is due to the fact that the truncated representations that we use are incomplete, or due to the fact that the reference heuristics are not reflected in the probability distribution of motifs in the PDB

Summary

INTRODUCTION

Macromolecules are characterized by their capability of folding and assembling into hierarchical structures, which is a crucial element in their activity and stability. Rosetta, one of the most well-known energy functions, has been developed to predict the structure of a protein given its amino acid sequence and local structural features such as dihedral angles (Simons et al, 1997, 1999) Another example where purely data-driven definitions would be advantageous is in secondary structure classification. While several methods exist to classify protein secondary structure (Kabsch and Sander, 1983; Frishman and Argos, 1995, 1996; Jones, 1999; Cuff and Barton, 2000; Andersend et al, 2002; Martin et al, 2005; Nagy and Oostenbrink, 2014; Haghighi et al, 2016), these methods rely on amino acid sequences, hydrogen bonding energies, geometrical criteria, or some combination thereof. By comparing the fidelity of the unsupervised classification given by PAMM with that of a supervised scheme, we can assess whether classification errors stem from an incomplete representation or are a manifestation of the arbitrary nature of heuristic methods

METHODS

Hydrogen Bond Definitions

Clustering Parameters

Dihedral Angles for Secondary

Clustering and Secondary Structure

Comparison of Secondary-Structure Definitions

Smooth Overlap of Atomic Positions

Brief Introduction to SOAP

Supervised Classification

Hydrogen Bonds

Dihedral Angles and Protein

SOAP Environments

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in molecular biosciences	Publication Date: Apr 18, 2019
Citations: 11	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Atomic Motif Recognition in (Bio)Polymers: Benchmarks From the Protein Data Bank.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in molecular biosciences

Lead the way for us

Similar Papers

High Resolution Structure of the Phosphohistidine-activated Form of Escherichia coli Cofactor-dependent Phosphoglycerate Mutase
William N Hunter ... Charles S Bond
Journal of Biological Chemistry | VOL. 276
William N Hunter, et. al.William N Hunter ... Charles S Bond
01 Feb 2001
Journal of Biological Chemistry | VOL. 276

Activation segment dimerization: a mechanism for kinase autophosphorylation of non-consensus sites
Ashley C W Pike ... Stefan Knapp
The EMBO Journal | VOL. 27
Ashley C W Pike, et. al.Ashley C W Pike ... Stefan Knapp
31 Jan 2008
The EMBO Journal | VOL. 27

Solitary and Repetitive Binding Motifs for the AP2 Complex α-Appendage in Amphiphysin and Other Accessory Proteins
M Madan Babu ... Peter H Li
Journal of Biological Chemistry | VOL. 283
M Madan Babu, et. al.M Madan Babu ... Peter H Li
01 Feb 2008
Journal of Biological Chemistry | VOL. 283

Antigenic Peptide Loading into Major Histocompatibility Complex Class I Is Driven by the Substrate N-Terminus
Honglin Xu ... Lin-Tai Da
CCS Chemistry | VOL. 4
Honglin Xu, et. al.Honglin Xu ... Lin-Tai Da
16 Apr 2021
CCS Chemistry | VOL. 4

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Atomic Motif Recognition in (Bio)Polymers: Benchmarks From the Protein Data Bank.

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in molecular biosciences