Lightweight ProteinUnet2 network for protein secondary structure prediction: a step towards proper evaluation

Katarzyna Stapor,Tomasz Smolarczyk,Irena Roterman,Krzysztof Kotowski

doi:10.1186/s12859-022-04623-z

Katarzyna Stapor, Tomasz Smolarczyk + Show 2 more

Open Access

https://doi.org/10.1186/s12859-022-04623-z

Copy DOI

Abstract

BackgroundThe prediction of protein secondary structures is a crucial and significant step for ab initio tertiary structure prediction which delivers the information about proteins activity and functions. As the experimental methods are expensive and sometimes impossible, many SS predictors, mainly based on different machine learning methods have been proposed for many years. Currently, most of the top methods use evolutionary-based input features produced by PSSM and HHblits software, although quite recently the embeddings—the new description of protein sequences generated by language models (LM) have appeared that could be leveraged as input features. Apart from input features calculation, the top models usually need extensive computational resources for training and prediction and are barely possible to run on a regular PC. SS prediction as the imbalanced classification problem should not be judged by the commonly used Q3/Q8 metrics. Moreover, as the benchmark datasets are not random samples, the classical statistical null hypothesis testing based on the Neyman–Pearson approach is not appropriate.ResultsWe present a lightweight deep network ProteinUnet2 for SS prediction which is based on U-Net convolutional architecture and evolutionary-based input features (from PSSM and HHblits) as well as SPOT-Contact features. Through an extensive evaluation study, we report the performance of ProteinUnet2 in comparison with top SS prediction methods based on evolutionary information (SAINT and SPOT-1D). We also propose a new statistical methodology for prediction performance assessment based on the significance from Fisher–Pitman permutation tests accompanied by practical significance measured by Cohen’s effect size.ConclusionsOur results suggest that ProteinUnet2 architecture has much shorter training and inference times while maintaining results similar to SAINT and SPOT-1D predictors. Taking into account the relatively long times of calculating evolutionary-based features (from PSSM in particular), it would be worth conducting the predictive ability tests on embeddings as input features in the future. We strongly believe that our proposed here statistical methodology for the evaluation of SS prediction results will be adopted and used (and even expanded) by the research community.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Mar 22, 2022
Citations: 1	License type: open-access

R Discovery Prime

R Discovery Prime

Lightweight ProteinUnet2 network for protein secondary structure prediction: a step towards proper evaluation

Abstract

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Protein secondary structure prediction through a novel framework of secondary structure transition sites and new encoding schemes
Masood Zamani ... Stefan C Kremer
-
Masood Zamani, et. al.Masood Zamani ... Stefan C Kremer
01 Oct 2016
01 Oct 2016

OneHotEncoding and LSTM-based deep learning models for protein secondary structure prediction
Vamsidhar Enireddy ... D. Vijendra Babu
Soft Computing | VOL. 26
Vamsidhar Enireddy, et. al.Vamsidhar Enireddy ... D. Vijendra Babu
12 Feb 2022
Soft Computing | VOL. 26

Protein secondary structure prediction by using deep learning method
Yangxu Wang ... Zhang Yi
Knowledge-Based Systems | VOL. 118
Yangxu Wang, et. al.Yangxu Wang ... Zhang Yi
17 Nov 2016
Knowledge-Based Systems | VOL. 118

Prediction of protein secondary structures by a neural network.
Fumiyoshi Sasagawa ... Koji Tajima
Computer applications in the biosciences : CABIOS | VOL. 9
Fumiyoshi Sasagawa, et. al.Fumiyoshi Sasagawa ... Koji Tajima
01 Jan 1992
Computer applications in the biosciences : CABIOS | VOL. 9

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Lightweight ProteinUnet2 network for protein secondary structure prediction: a step towards proper evaluation

Abstract

Talk to us

Similar Papers

More From: BMC Bioinformatics