Abstract

Recently, the IUCr (International Union of Crystallography) initiated the formation of a Diffraction Data Deposition Working Group with the aim of developing standards for the representation of raw diffraction data associated with the publication of structural papers. Archiving of raw data serves several goals: to improve the record of science, to verify the reproducibility and to allow detailed checks of scientific data, safeguarding against fraud and to allow reanalysis with future improved techniques. A means of studying this issue is to submit exemplar publications with associated raw data and metadata. In a recent study of the binding of cisplatin and carboplatin to histidine in lysozyme crystals under several conditions, the possible effects of the equipment and X-ray diffraction data-processing software on the occupancies and B factors of the bound Pt compounds were compared. Initially, 35.3 GB of data were transferred from Manchester to Utrecht to be processed with EVAL. A detailed description and discussion of the availability of metadata was published in a paper that was linked to a local raw data archive at Utrecht University and also mirrored at the TARDIS raw diffraction data archive in Australia. By making these raw diffraction data sets available with the article, it is possible for the diffraction community to make their own evaluation. This led to one of the authors of XDS (K. Diederichs) to re-integrate the data from crystals that supposedly solely contained bound carboplatin, resulting in the analysis of partially occupied chlorine anomalous electron densities near the Pt-binding sites and the use of several criteria to more carefully assess the diffraction resolution limit. General arguments for archiving raw data, the possibilities of doing so and the requirement of resources are discussed. The problems associated with a partially unknown experimental setup, which preferably should be available as metadata, is discussed. Current thoughts on data compression are summarized, which could be a solution especially for pixel-device data sets with fine slicing that may otherwise present an unmanageable amount of data.

Highlights

  • A Diffraction Data Deposition Working Group has been set up by the IUCr to consider the benefits, possibilities and costs of archiving raw diffraction images

  • The Joint Center for Structural Genomics has created a unique repository of X-ray crystallographic datasets for the structures that it has solved and deposited in the Protein Data Bank

  • As a follow-up to the research performed with the archived data, additional X-ray diffraction data sets were collected in Manchester from hen egg-white lysozyme (HEWL) crystals co-crystallized with carboplatin without sodium chloride (Tanley, Diederichs et al, 2013b) to eliminate the partial conversion of carboplatin to cisplatin observed previously and were processed with SAINT, EVAL and XDS

Read more

Summary

Introduction

A Diffraction Data Deposition Working Group has been set up by the IUCr to consider the benefits, possibilities and costs of archiving raw diffraction images. This archive contains the experimental data and analyses from the data collection, data reduction, phasing, density modification, model building and refinement of JCSG structures. Funding agencies are requesting or requiring data-management policies (including provision for data retention and access) to be taken into account when awarding grants: see, for example, the Research Councils UK Common Principles on Data Policy (http:// www.rcuk.ac.uk/research/Pages/DataPolicy.aspx) and the Digital Curation Centre overview of funding policies in the UK (http://www.dcc.ac.uk/resources/policy-and-legal/overviewfunders-data-policies) It is worth noting, that these policies do not explicitly differentiate amongst derived, processed and raw data. In addition, the problems associated with a partially unknown experimental setup, which preferably should be available as metadata, are discussed

Why store raw data?
Transferring and storing raw diffraction images
Costs of the storage of terabytes of data
Metadata
Data processing
Detector gain and standard deviations
The usefulness of reintegrating data
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call