FoldHSphere: deep hyperspherical embeddings for protein fold recognition

Amelia Villegas-Morcillo,Angel M Gomez,Victoria Sanchez

doi:10.1186/s12859-021-04419-7

Amelia Villegas-Morcillo, Angel M Gomez + Show 1 more

Open Access

https://doi.org/10.1186/s12859-021-04419-7

Copy DOI

Journal: BMC Bioinformatics	Publication Date: Oct 12, 2021
Citations: 9	License type: open-access

Affiliation: University of Granada

Abstract

BackgroundCurrent state-of-the-art deep learning approaches for protein fold recognition learn protein embeddings that improve prediction performance at the fold level. However, there still exists aperformance gap at the fold level and the (relatively easier) family level, suggesting that it might be possible to learn an embedding space that better represents the protein folds.ResultsIn this paper, we propose the FoldHSphere method to learn a better fold embedding space through a two-stage training procedure. We first obtain prototype vectors for each fold class that are maximally separated in hyperspherical space. We then train a neural network by minimizing the angular large margin cosine loss to learn protein embeddings clustered around the corresponding hyperspherical fold prototypes. Our network architectures, ResCNN-GRU and ResCNN-BGRU, process the input protein sequences by applying several residual-convolutional blocks followed by a gated recurrent unit-based recurrent layer. Evaluation results on the LINDAHL dataset indicate that the use of our hyperspherical embeddings effectively bridges the performance gap at the family and fold levels. Furthermore, our FoldHSpherePro ensemble method yields an accuracy of 81.3% at the fold level, outperforming all the state-of-the-art methods.ConclusionsOur methodology is efficient in learning discriminative and fold-representative embeddings for the protein domains. The proposed hyperspherical embeddings are effective at identifying the protein fold class by pairwise comparison, even when amino acid sequence similarities are low.

Full Text