Deep Learning to Predict Protein Backbone Structure from High-Resolution Cryo-EM Density Maps

Dong Si,Liguo Wang,Renzhi Cao,Jianlin Cheng,Jonas Pfab,Spencer A Moritz,Jie Hou,Tianqi Wu

doi:10.1038/s41598-020-60598-y

Abstract

Cryo-electron microscopy (cryo-EM) has become a leading technology for determining protein structures. Recent advances in this field have allowed for atomic resolution. However, predicting the backbone trace of a protein has remained a challenge on all but the most pristine density maps (<2.5 Å resolution). Here we introduce a deep learning model that uses a set of cascaded convolutional neural networks (CNNs) to predict Cα atoms along a protein’s backbone structure. The cascaded-CNN (C-CNN) is a novel deep learning architecture comprised of multiple CNNs, each predicting a specific aspect of a protein’s structure. This model predicts secondary structure elements (SSEs), backbone structure, and Cα atoms, combining the results of each to produce a complete prediction map. The cascaded-CNN is a semantic segmentation image classifier and was trained using thousands of simulated density maps. This method is largely automatic and only requires a recommended threshold value for each protein density map. A specialized tabu-search path walking algorithm was used to produce an initial backbone trace with Cα placements. A helix-refinement algorithm made further improvements to the α-helix SSEs of the backbone trace. Finally, a novel quality assessment-based combinatorial algorithm was used to effectively map protein sequences onto Cα traces to obtain full-atom protein structures. This method was tested on 50 experimental maps between 2.6 Å and 4.4 Å resolution. It outperformed several state-of-the-art prediction methods including Rosetta de-novo, MAINMAST, and a Phenix based method by producing the most complete predicted protein structures, as measured by percentage of found Cα atoms. This method accurately predicted 88.9% (mean) of the Cα atoms within 3 Å of a protein’s backbone structure surpassing the 66.8% mark achieved by the leading alternate method (Phenix based fully automatic method) on the same set of density maps. The C-CNN also achieved an average root-mean-square deviation (RMSD) of 1.24 Å on a set of 50 experimental density maps which was tested by the Phenix based fully automatic method. The source code and demo of this research has been published at https://github.com/DrDongSi/Ca-Backbone-Prediction.

Highlights

Proteins perform a vast array of functions within organisms
The experimental Cryo-electron microscopy (cryo-EM) density maps and the corresponding Protein Data Bank (PDB) entries were downloaded from Electron Microscopy Data Bank (EMDB) for the purpose of final evaluation on experimental data
We found during development that the biggest improvements in accuracy came as the result of adding more convolutional neural networks to the C-CNN

Summary

Introduction

Proteins perform a vast array of functions within organisms. From molecule transportation, to mechanical cellular support, to immune protection, proteins are the central building blocks of life in the universe[1]. The cryo-EM field is slowly moving to allow many high-resolution maps produced in one project or study[12,13] Some of these studies involve very large protein assemblies of many subunits. The backbone structure consists of a repeated sequence of three atom (nitrogen, alpha-carbon, carbon) Of these three atoms, the alpha-carbon (Cα) is important as it is the central point for each amino acid residue within the protein. In addition to the backbone features of a protein, some of the most visually dominate features of cryo-EM density maps are the secondary structure elements (SSEs), see Fig. 1B. When imaged with cryo-EM, turns/loops often appear faint due to their relatively low electron density. This makes them one of the most challenging SSE to classify. At near-atomic resolution, in general one can still recognize β-sheets and α-helix pitches

Objectives

Methods

Results

Discussion

Conclusion