Abstract

A promising new route for structural biology is single-particle imaging with an X-ray Free-Electron Laser (XFEL). This method has the advantage that the samples do not require crystallization and can be examined at room temperature. However, high-resolution structures can only be obtained from a sufficiently large number of diffraction patterns of individual molecules, so-called single particles. Here, we present a method that allows for efficient identification of single particles in very large XFEL datasets, operates at low signal levels, and is tolerant to background. This method uses supervised Geometric Machine Learning (GML) to extract low-dimensional feature vectors from a training dataset, fuse test datasets into the feature space of training datasets, and separate the data into binary distributions of “single particles” and “non-single particles.” As a proof of principle, we tested simulated and experimental datasets of the Coliphage PR772 virus. We created a training dataset and classified three types of test datasets: First, a noise-free simulated test dataset, which gave near perfect separation. Second, simulated test datasets that were modified to reflect different levels of photon counts and background noise. These modified datasets were used to quantify the predictive limits of our approach. Third, an experimental dataset collected at the Stanford Linear Accelerator Center. The single-particle identification for this experimental dataset was compared with previously published results and it was found that GML covers a wide photon-count range, outperforming other single-particle identification methods. Moreover, a major advantage of GML is its ability to retrieve single particles in the presence of structural variability.

Highlights

  • X-ray free-electron lasers (XFELs) generate femtosecond x-ray pulses with unprecedented intense brightness and high repetition rates, which have been used to determine biomolecular structures at high resolution and on ultra-short timescales.1 This remarkable advance recently opened a path for breakthrough research in structural biology.2 For example, serial-femtosecond crystallography (SFX) has been successfully employed to determine structures with near-atomic resolution and femtosecond time dynamics.3,4 SFX requires crystallization of the target biomolecules

  • The data processing workflows for cryo-EM and SingleParticle Imaging (SPI) are quite similar, and both share a common requirement to achieve a high-resolution reconstruction: the single-particle identification step must retrieve a large number of snapshots, in order to overcome the noise from the low signal levels

  • We presented an efficient single-particle identification method based on Geometric Machine Learning

Read more

Summary

INTRODUCTION

X-ray free-electron lasers (XFELs) generate femtosecond x-ray pulses with unprecedented intense brightness and high repetition rates, which have been used to determine biomolecular structures at high resolution and on ultra-short timescales. This remarkable advance recently opened a path for breakthrough research in structural biology. For example, serial-femtosecond crystallography (SFX) has been successfully employed to determine structures with near-atomic resolution and femtosecond time dynamics. SFX requires crystallization of the target biomolecules. An ideal singleparticle identification algorithm should reliably identify the subtle differences of these two groups and retrieve the desired single-particle diffraction patterns at low signal levels and high background noise. In this regard, initial effort was based on unsupervised classification methods, such as manifold embedding, which clusters similar diffraction patterns into regions in a low-dimensional feature space, and Principal Component Analysis (PCA) that quantifies correlations among diffraction patterns.. The data processing workflows for cryo-EM and SPI are quite similar, and both share a common requirement to achieve a high-resolution reconstruction: the single-particle identification step must retrieve a large number of snapshots, in order to overcome the noise from the low signal levels. Our results demonstrate that GML is a promising and efficient data analysis technique for single-particle identification of large datasets

METHODS
Findings
CONCLUSIONS

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.