Abstract

Structural heterogeneity in single-particle cryo-electron microscopy (cryo-EM) data represents a major challenge for high-resolution structure determination. Unsupervised classification may serve as the first step in the assessment of structural heterogeneity. However, traditional algorithms for unsupervised classification, such as K-means clustering and maximum likelihood optimization, may classify images into wrong classes with decreasing signal-to-noise-ratio (SNR) in the image data, yet demand increased computational costs. Overcoming these limitations requires further development of clustering algorithms for high-performance cryo-EM data processing. Here we introduce an unsupervised single-particle clustering algorithm derived from a statistical manifold learning framework called generative topographic mapping (GTM). We show that unsupervised GTM clustering improves classification accuracy by about 40% in the absence of input references for data with lower SNRs. Applications to several experimental datasets suggest that our algorithm can detect subtle structural differences among classes via a hierarchical clustering strategy. After code optimization over a high-performance computing (HPC) environment, our software implementation was able to generate thousands of reference-free class averages within hours in a massively parallel fashion, which allows a significant improvement on ab initio 3D reconstruction and assists in the computational purification of homogeneous datasets for high-resolution visualization.

Highlights

  • To quantify the classification accuracy of our approach, we generated a series of synthetic datasets with various SNRs, each of which was composed of 50,000 simulated images as described above

  • When the SNR was decreased to 0.005, the classification accuracies of these approaches were further reduced (Fig 2F and 2G and S3E and S3F Fig). These results suggest that a high level of noise significantly compromises the performance of data clustering by conventional principal component analysis (PCA)/K-means clustering approaches

  • We introduced a generative topographic mapping (GTM)-based unsupervised clustering method incorporating MAP2D-based image alignment, and implemented this approach in the open-source software ROME

Read more

Summary

Introduction

Statistical manifold learning for single-particle cryo-EM experiments were performed in part at the Center for Nanoscale Systems at Harvard University, a member of the National Nanotechnology Coordinated Infrastructure Network (NNCI), which is supported by the National Science Foundation of the USA under NSF award no. The data processing was performed in part in the Sullivan cluster at Dana-Farber Cancer Institute, which is funded in part by a gift from Mr and Mrs Daniel J. Jr. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Intel Corporation provided support in the form of salaries for the authors C.C. and B.B., but did not have any additional role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript. The specific roles of these authors are articulated in the ‘author contributions’ section

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call