Classification of helical polymers with deep-learning language models

Daoyi Li,Wen Jiang

doi:10.1016/j.jsb.2023.108041

Abstract

Many macromolecules in biological systems exist in the form of helical polymers. However, the inherent polymorphism and heterogeneity of samples complicate the reconstruction of helical polymers from cryo-EM images. Currently, available 2D classification methods are effective at separating particles of interest from contaminants, but they do not effectively differentiate between polymorphs, resulting in heterogeneity in the 2D classes. As such, it is crucial to develop a method that can computationally divide a dataset of polymorphic helical structures into homogenous subsets. In this work, we utilized deep-learning language models to embed the filaments as vectors in hyperspace and group them into clusters. Tests with both simulated and experimental datasets have demonstrated that our method – HLM (Helical classification with Language Model) can effectively distinguish different types of filaments, in the presence of many contaminants and low signal-to-noise ratios. We also demonstrate that HLM can isolate homogeneous subsets of particles from a publicly available dataset, resulting in the discovery of a previously unreported filament variant with an extra density around the tau filaments.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Classification of helical polymers with deep-learning language models

Abstract

Talk to us

Similar Papers

More From: Journal of Structural Biology

Lead the way for us

Journal: Journal of Structural Biology	Publication Date: Nov 7, 2023
License type: cc-by-nd

Similar Papers

Author response: Assembly of recombinant tau into filaments identical to those of Alzheimer’s disease and chronic traumatic encephalopathy
Sofia Lövestam ... Sjors HW Scheres
-
Sofia Lövestam, et. al.Sofia Lövestam ... Sjors HW Scheres
03 Mar 2022
03 Mar 2022

Filament size influences temperature changes and brain damage following middle cerebral artery occlusion in rats.
Hajnalka Ábrahám ... Akira Arimura
Experimental brain research | VOL. 142
Hajnalka Ábrahám, et. al.Hajnalka Ábrahám ... Akira Arimura
31 Oct 2001
Experimental brain research | VOL. 142

How well do pre-trained contextual language representations recommend labels for GitHub issues?
Jun Wang ... Lin Chen
Knowledge-Based Systems | VOL. 232
Jun Wang, et. al.Jun Wang ... Lin Chen
10 Sep 2021
Knowledge-Based Systems | VOL. 232

Learning to Read and Write in the Language of Proteins
Helen T Hobbs ... Chang C Liu
GEN Biotechnology | VOL. 2
Helen T Hobbs, et. al.Helen T Hobbs ... Chang C Liu
01 Apr 2023
GEN Biotechnology | VOL. 2

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Classification of helical polymers with deep-learning language models

Abstract

Talk to us

Similar Papers

More From: Journal of Structural Biology