Abstract

This data article makes available the informed computation of the whole Protein Data Bank (PDB) to investigate diffraction anisotropy on a large scale and to perform statistics. This data has been investigated in detail in “X-ray diffraction reveals the intrinsic difference in the physical properties of membrane and soluble proteins” [1]. Diffraction anisotropy is traditionally associated with absence of contacts in-between macromolecules within the crystals in a given direction of space. There are however many case that do not follow this empirical rule. To investigate and sort out this discrepancy, we computed diffraction anisotropy for every entry of the PDB, and put them in context of relevant metrics to compare X-ray diffraction in reciprocal space to the crystal packing in real space. These metrics were either extracted from PDB files when available (resolution, space groups, cell parameters, solvent content), or calculated using standard procedures (anisotropy, crystal contacts, presence of ligands). More specifically, we separated entries to compare soluble vs membrane proteins, and further separated the later in subcategories according to their insertion in the membrane, function, or type of crystallization (Type I vs Type II crystal packing). This informed database is being made available to investigators in the raw and curated formats that can be re-used for further downstream studies. This dataset is useful to test ideas and to ascertain hypothesis based on statistical analysis.

Highlights

  • HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not

  • Biology Crystallography Excel sheet document Advanced computation on Protein Data Bank [2] data Raw and curated Each Protein Data Bank entry were retrieved for both experimental diffraction data and deposited model, and further processed and classified according to biologically driven criterion

  • Separation between soluble and membrane proteins; membrane proteins were further separated in different subclasses

Read more

Summary

Data mining and computation

As of February 24th, 2016, a local copy of the RCSB Protein Data Bank (PDB) was made including all the deposited structures in PDB formatted coordinate files as well as all the crystallographic structure factors in mmCIF format To this date, out of 115,888 available structures, 103,530 were solved by X-ray crystallography and 92,995 related structure factors files were accessible. Out of 115,888 available structures, 103,530 were solved by X-ray crystallography and 92,995 related structure factors files were accessible For further processing, these last were converted from mmCIF to CCP4 MTZ format with the sf-convert software version 1.204 (developed at RCSB and downloadable at http://deposit.pdb.org/software). The differences came from the fact that a number of structure factor files did not contain intensity data and/or accurate information (i.e. missing or null σ(F), σ(I) values, etc.) All these data were joined, sorted by PDB entry code and imported in an Excel 2013 (Microsoft Corporation) spreadsheet

Curation
Subsets extraction
Code availability
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call