Abstract

A peak-finding algorithm for serial crystallography (SX) data analysis based on the principle of 'robust statistics' has been developed. Methods which are statistically robust are generally more insensitive to any departures from model assumptions and are particularly effective when analysing mixtures of probability distributions. For example, these methods enable the discretization of data into a group comprising inliers (i.e. the background noise) and another group comprising outliers (i.e. Bragg peaks). Our robust statistics algorithm has two key advantages, which are demonstrated through testing using multiple SX data sets. First, it is relatively insensitive to the exact value of the input parameters and hence requires minimal optimization. This is critical for the algorithm to be able to run unsupervised, allowing for automated selection or 'vetoing' of SX diffraction data. Secondly, the processing of individual diffraction patterns can be easily parallelized. This means that it can analyse data from multiple detector modules simultaneously, making it ideally suited to real-time data processing. These characteristics mean that the robust peak finder (RPF) algorithm will be particularly beneficial for the new class of MHz X-ray free-electron laser sources, which generate large amounts of data in a short period of time.

Highlights

  • X-ray crystallography is one of the most important tools in structural biology, responsible for over 80% of the biomolecular structures solved today and deposited in the Protein Data Bank (Berman et al, 2003)

  • Serial crystallography experiments performed at facilities such as the European X-ray freeelectron lasers (XFELs) (EuXFEL) generate massive data sets that can be as large as 1 petabyte (1015 bytes) per experiment (Wiedorn et al, 2018)

  • In this paper we have introduced an algorithm, termed the ‘robust peak finder’, for outlier detection to identify crystal diffraction patterns in serial crystallography experiments

Read more

Summary

Introduction

X-ray crystallography is one of the most important tools in structural biology, responsible for over 80% of the biomolecular structures solved today and deposited in the Protein Data Bank (Berman et al, 2003). The goal is to be able to filter data sets, by rejecting data that are unusable or do not contain any useful information, whilst preserving all images which contain any signal produced by interaction of the beam with the sample This need has motivated the current effort to develop a robust and efficient method for detecting Bragg peaks which can be deployed to reduce the size of the data set obtained during SX experiments. Even though peak-finding methods have been used successfully previously, parameters often need to be optimized during the experiment before they can work effectively This limits their reliability and effectiveness in the context of online data processing, and has motivated the development of a more robust approach which is the subject of this paper. We conclude with a discussion of the benefits of using the algorithm in terms of online SX data monitoring

Background
Peak finding
Robust model fitting
RPF implementation
CXIDB32 data set
EuXFEL commissioning data set
Petra III p11 data set
Sensitivity analysis
Pre-calculation of global threshold
Conclusion
Findings
Funding information
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.