Discovering the Ultimate Limits of Protein Secondary Structure Prediction.

Chia-Tzu Ho,Yu-Wei Huang,Teng-Ruei Chen,Wei-Cheng Lo,Chia-Hua Lo

doi:10.3390/biom11111627

Chia-Tzu Ho, Yu-Wei Huang + Show 3 more

Open Access

https://doi.org/10.3390/biom11111627

Copy DOI

Journal: Biomolecules	Publication Date: Nov 3, 2021
Citations: 7	License type: CC BY 4.0

Affiliation: National Yang Ming Chiao Tung University

Abstract

Secondary structure prediction (SSP) of proteins is an important structural biology technique with many applications. There have been ~300 algorithms published in the past seven decades with fierce competition in accuracy. In the first 60 years, the accuracy of three-state SSP rose from ~56% to 81%; after that, it has long stayed at 81–86%. In the 1990s, the theoretical limit of three-state SSP accuracy had been estimated to be 88%. Thus, SSP is now generally considered not challenging or too challenging to improve. However, we found that the limit of three-state SSP might be underestimated. Besides, there is still much room for improving segment-based and eight-state SSPs, but the limits of these emerging topics have not been determined. This work performs large-scale sequence and structural analyses to estimate SSP accuracy limits and assess state-of-the-art SSP methods. The limit of three-state SSP is re-estimated to be ~92%, 4–5% higher than previously expected, indicating that SSP is still challenging. The estimated limit of eight-state SSP is 84–87%. Several proposals for improving future SSP algorithms are made based on our results. We hope that these findings will help move forward the development of SSP and all its applications.

Highlights

Considering the current homology-based methodology, we proposed that the practical limits of secondary structure prediction (SSP) methods that work with PSI-BLAST position-specific scoring matrix (PSSM) and PSSM reference datasets of
Since the query set we utilized was over 10 times larger than previous works’, 10 repeats of the experiment were allowed by random sampling
The SSP methods’ performances for different sizes were even, the accuracy slightly decreased as the size increased. These results suggested that overcoming the difficulties brought about by long-range interacting β-sheets and monotonic helices codes shall be the key to improving future SSP algorithms

Summary

Introduction

Protein secondary structure prediction (SSP) means to predict the per-residue backbone conformation of a protein based on the amino acid sequence. It is an essential structural biology technique with a variety of applications. We believe that precisely determining the limit of SSP will help re-energize this field, set new directions for SSP developments, and benefit all applications relying on SSP. We performed exhaustive pairwise sequence and structure alignments on protein structure databases and estimated the theoretical limits of three- and eight-state SSPs. Besides, the experimental results revealed valuable information for future SSP developments

Objectives

Methods

Results

Discussion

Conclusion