Metalloproteins require metal ions as cofactors to catalyze specific reactions with remarkable efficiency and specificity. In various electron transfer reactions, metals in the active sites change their oxidation states to facilitate the biochemical reactions. Cryogenic electron microscopy, X-ray, and X-ray free electron laser (XFEL) crystallography are used to image metalloproteins to understand the reaction mechanisms. However, radiation damage in cryoEM and X-ray crystallography, and the challenge of generating homogeneous crystals and keeping the appropriate experimental conditions for all the crystals in XFEL crystallography, may alter the oxidation states. Here, we build machine learning models trained on a large data set from the Cambridge Crystallographic Data Center to evaluate the metal oxidation states. The models yield high accuracy scores (from 82% to 94%) for all metals in the small molecules. Then, they were used to predict the oxidation states of more than 30 000 metal clusters in metalloproteins with Fe, Mn, Co, and Cu in their active sites. We found that most of the metals exist in the lower oxidation states (Fe2+ 77%, Mn2+ 85%, Co2+ 65%, and Cu+ 64%), and these populations correlate with the standard reduction potentials of the metal ions. Furthermore, we found no clear correlation between these populations and the resolution of the structures, which suggests no significant dependence of these predictions on the resolution. Our models represent a valuable tool for evaluating the oxidation states of the metals in metalloproteins imaged with different techniques. The data files and the machine learning code are available in a public GitHub repository: https://github.com/mamin03/OxitationStatesMetalloprotein.git.
Read full abstract