Abstract

Protein structure prediction is a long-standing unsolved problem in molecular biology that has seen renewed interest with the recent success of deep learning with AlphaFold at CASP13. While developing and evaluating protein structure prediction methods, researchers may want to identify the most similar known structures to their predicted structures. These predicted structures often have low sequence and structure similarity to known structures. We show how RUPEE, a purely geometric protein structure search, is able to identify the structures most similar to structure predictions, regardless of how they vary from known structures, something existing protein structure searches struggle with. RUPEE accomplishes this through the use of a novel linear encoding of protein structures as a sequence of residue descriptors. Using a fast Needleman-Wunsch algorithm, RUPEE is able to perform alignments on the sequences of residue descriptors for every available structure. This is followed by a series of increasingly accurate structure alignments from TM-align alignments initialized with the Needleman-Wunsch residue descriptor alignments to standard TM-align alignments of the final results. By using alignment normalization effectively at each stage, RUPEE also can execute containment searches in addition to full-length searches to identify structural motifs within proteins. We compare the results of RUPEE to the protein structure searches mTM-align, SSM, CATHEDRAL, and VAST using a benchmark derived from the protein structure predictions submitted to CASP13. RUPEE identifies better alignments on average with respect to TM-score as well as scores specific to SSM and CATHEDRAL, Q-score and SSAP-score, respectively.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.