Abstract
The structure of a protein provides insight into its physiological interactions with other components of the cellular soup. Methods that predict putative structures from sequences typically yield multiple, closely-ranked possibilities. A critical component in the process is the model quality assessing program (MQAP), which selects the best candidate from this pool of structures. Here, we present a novel MQAP based on the physical properties of sidechain atoms. We propose a method for assessing the quality of protein structures based on the electrostatic potential difference (EPD) of Cβ atoms in consecutive residues. We demonstrate that the EPDs of Cβ atoms on consecutive residues provide unique signatures of the amino acid types. The EPD of Cβ atoms are learnt from a set of 1000 non-homologous protein structures with a resolution cuto of 1.6 Å obtained from the PISCES database. Based on the Boltzmann hypothesis that lower energy conformations are proportionately sampled more, and on Annsen's thermodynamic hypothesis that the native structure of a protein is the minimum free energy state, we hypothesize that the deviation of observed EPD values from the mean values obtained in the learning phase is minimized in the native structure. We achieved an average specificity of 0.91, 0.94 and 0.93 on hg_structal, 4state_reduced and ig_structal decoy sets, respectively, taken from the Decoys `R' Us database. The source code and manual is made available at https://github.com/sanchak/mqap and permanently available on 10.5281/zenodo.7134.
Highlights
The challenge of deriving the native structure of a protein from its sequence has intrigued researchers for decades[1]
For a system in thermodynamic equilibrium, statistical physics hypothesizes that the accessible states are populated with a frequency which depends on the free energy of the state and is given by the Boltzmann distribution
The Boltzmann hypothesis states that if the database of known native protein structures is assumed to be a statistical system in thermodynamic equilibrium, specific structural features would be populated based on the free energy of the protein conformational state
Summary
The challenge of deriving the native structure of a protein from its sequence has intrigued researchers for decades[1]. Sippl reasoned that the frequencies of occurrence of structural features such as interatomic distances in the database of known protein structures could be used to assign a free energy (potential of mean force) for a given protein conformation[21,22]. This statistical potential can be used to discriminate the native structure[23,24,25,26,27]. The paramount importance of obtaining high quality protein structures from sequences using in silico methods can be estimated by the effort invested by researchers every two years[34] to evaluate both structure prediction tools[35] and MQAPs17,34,36
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have