Abstract

Background: Cryo-electron microscopy (cryo-EM) has become a major technique for protein structure determination. However, due to the low quality of cryo-EM density maps, many protein structures derived from cryo-EM contain outliers introduced during the modeling process. The current protein model validation system lacks identification features for cryo-EM proteins making it not enough to identify outliers in cryo-EM proteins. Methods: This study introduces an efficient unsupervised outlier detection model for validating protein models built from cryo-EM technique. The current model uses a high-resolution X-ray dataset (<1.5 Å) as the reference dataset. The distal block distance, side-chain length, phi, psi, and first chi angle of the residues in the reference dataset are collected and saved as a database of the histogram-based outlier score (HBOS). The HBOS value of the residues in target cryo-EM proteins can be read from this HBOS database. Results: Protein residues with a HBOS value greater than ten are labeled as outliers by default. Four datasets containing proteins derived from cryo-EM density maps were tested with this probabilistic anomaly detection model. Conclusions: According to the proposed model, a visualization assistant tool was designed for Chimera, a protein visualization platform.

Highlights

  • As of 28 July 2019, the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB) [10] contains 3528 proteins solved by Electron Microscopy (EM) techniques, which is about 2% of the 154,243 structures on RCSB PDB

  • Due to the absence of validation tools for protein structures built from Cryo-electron microscopy (cryo-EM), the cryo-EM proteins and the

  • The data set, referred to as X-ray-1.5, contains 9131 protein structures that are solved with X-ray crystallography and have a resolution better than or equal to 1.5 Å

Read more

Summary

Introduction

Cryo-electron microscopy (cryo-EM) is becoming an essential method for producing three-dimensional atomic structures of proteins due to the application of image-processing techniques [1,2,3,4] in protein modeling, as well as, the introduction of direct electron detectors [5,6,7].Cryo-EM can help obtain protein structures in near-native conditions by freezing a macromolecule solution [8], without the concerns of crystallization in X-ray or high-energy damage in NMR.These features attract biologists to use cryo-EM, overcoming issues where X-ray and NMR struggle [9].As of 28 July 2019, the Research Collaboratory for Structural Bioinformatics (RCSB) Protein Data Bank (PDB) [10] contains 3528 proteins solved by EM techniques, which is about 2% of the 154,243 structures on RCSB PDB. Cryo-EM can help obtain protein structures in near-native conditions by freezing a macromolecule solution [8], without the concerns of crystallization in X-ray or high-energy damage in NMR. These features attract biologists to use cryo-EM, overcoming issues where X-ray and NMR struggle [9]. Due to the low quality of cryo-EM density maps, many protein structures derived from cryo-EM contain outliers introduced during the modeling process. Methods: This study introduces an efficient unsupervised outlier detection model for validating protein models built from cryo-EM technique. Four datasets containing proteins derived from cryo-EM density maps were tested with this probabilistic anomaly detection model. Conclusions: According to the proposed model, a visualization assistant tool was designed for Chimera, a protein visualization platform

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call