Cryo-electron tomography is a rapidly developing field for studying macromolecular complexes in their native environments and has the potential to revolutionize our understanding of protein function. However, fast and accurate identification of particles in cryo-tomograms is challenging and represents a significant bottleneck in downstream processes such as subtomogram averaging. Here, we present tomoCPT (Tomogram Centroid Prediction Tool), a transformer-based solution that reformulates particle detection as a centroid-prediction task using Gaussian labels. Our approach, which is built upon the SwinUNETR architecture, demonstrates superior performance compared with both conventional binary labelling strategies and template matching. We show that tomoCPT effectively generalizes to novel particle types through zero-shot inference and can be significantly enhanced through fine-tuning with limited data. The efficacy of tomoCPT is validated using three case studies: apoferritin, achieving a resolution of 3.0 Å compared with 3.3 Å using template matching, SARS-CoV-2 spike proteins on cell surfaces, yielding an 18.3 Å resolution map where template matching proved unsuccessful, and rubisco molecules within carboxysomes, reaching 8.0 Å resolution. These results demonstrate the ability of tomoCPT to handle varied scenarios, including densely packed environments and membrane-bound proteins. The implementation of the tool as a command-line program, coupled with its minimal data requirements for fine-tuning, makes it a practical solution for high-throughput cryo-ET data-processing workflows.
Read full abstract