Abstract
A synthetic data set demonstrating a particularly challenging case of indexing ambiguity in the context of radiation damage was generated. This set shall serve as a standard benchmark and reference point for the ongoing development of new methods and new approaches to robust structure solution when single-crystal methods are insufficient. Of the 100 short wedges of data, only the first 36 are currently necessary to solve the structure by `cheating', or using the correct reference structure as a guide. The total wall-clock time and number of crystals required to solve the structure without cheating is proposed as a metric for the efficacy and efficiency of a given multi-crystal automation pipeline.
Highlights
Data sets that challenge the capabilities of modern structuresolution procedures, algorithms and software are difficult for developers to obtain for a very simple reason: as soon as a solution is reached, the data set is no longer considered to be challenging
Data sets that are recalcitrant to current approaches are not available in public databases such as the Protein Data Bank (Berman et al, 2002) or image repositories (Grabowski et al, 2016; Morin et al, 2013) that only contain data used for solved structures
There is a fundamental limit to how small a protein crystal can be and still yield a complete data set (Holton & Frankel, 2010), so as beams and crystals become smaller and smaller the use of multi-crystal data sets becomes unavoidable
Summary
Data sets that challenge the capabilities of modern structuresolution procedures, algorithms and software are difficult for developers to obtain for a very simple reason: as soon as a solution is reached, the data set is no longer considered to be challenging. When testing the limits of software, it is generally much more useful to know ahead of time what the correct result will be. This enables the detection and optimization of partially successful solutions at every point in the process, even if downstream procedures fail. There is a fundamental limit to how small a protein crystal can be and still yield a complete data set (Holton & Frankel, 2010), so as beams and crystals become smaller and smaller the use of multi-crystal data sets becomes unavoidable. Much of the useful life of the sample is used up in the first few images using this strategy (Evans et al, 2011), and the challenge is to reassemble all of the data from a large number of highly incomplete data-collection runs, or wedges
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have