Abstract
Cryo-electron tomography (cryo-ET) is confronted with the intricate task of unveiling novel structures. General class discovery (GCD) seeks to identify new classes by learning a model that can pseudo-label unannotated (novel) instances solely using supervision from labeled (base) classes. While 2D GCD for image data has made strides, its 3D counterpart remains unexplored. Traditional methods encounter challenges due to model bias and limited feature transferability when clustering unlabeled 2D images into known and potentially novel categories based on labeled data. To address this limitation and extend GCD to 3D structures, we propose an innovative approach that harnesses a pretrained 2D transformer, enriched by an effective weight inflation strategy tailored for 3D adaptation, followed by a decoupled prototypical network. Incorporating the power of pretrained weight-inflated Transformers, we further integrate CLIP, a vision-language model to incorporate textual information. Our method synergizes a graph convolutional network with CLIP's frozen text encoder, preserving class neighborhood structure. In order to effectively represent unlabeled samples, we devise semantic distance distributions, by formulating a bipartite matching problem for category prototypes using a decoupled prototypical network. Empirical results unequivocally highlight our method's potential in unveiling hitherto unknown structures in cryo-ET. By bridging the gap between 2D GCD and the distinctive challenges of 3D cryo-ET data, our approach paves novel avenues for exploration and discovery in this domain.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have