AngularQA: Protein Model Quality Assessment with LSTM Networks

Matthew Conover,Miao Sun,Max Staples,Renzhi Cao,Dong Si

doi:10.1515/cmb-2019-0001

Abstract

Abstract Quality Assessment (QA) plays an important role in protein structure prediction. Traditional multimodel QA method usually suffer from searching databases or comparing with other models for making predictions, which usually fail when the poor quality models dominate the model pool. We propose a novel protein single-model QA method which is built on a new representation that converts raw atom information into a series of carbon-alpha (Cα) atoms with side-chain information, defined by their dihedral angles and bond lengths to the prior residue. An LSTM network is used to predict the quality by treating each amino acid as a time-step and consider the final value returned by the LSTM cells. To the best of our knowledge, this is the first time anyone has attempted to use an LSTM model on the QA problem; furthermore, we use a new representation which has not been studied for QA. In addition to angles, we make use of sequence properties like secondary structure parsed from protein structure at each time-step without using any database, which is different than all existed QA methods. Our model achieves an overall correlation of 0.651 on the CASP12 testing dataset. Our experiment points out new directions for QA problem and our method could be widely used for protein structure prediction problem. The software is freely available at GitHub: https://github.com/caorenzhi/AngularQA

Highlights

Protein folding prediction proves to be a major hurdle in modern biology (Wei and Zou 2016)
We propose a novel protein single-model Quality Assessment (QA) method which is built on a new representation that converts raw atom information into a series of carbon-alpha (Cα) atoms with side-chain information, de ned by their dihedral angles and bond lengths to the prior residue
To the best of our knowledge, this is the rst time anyone has attempted to use an LSTM model on the QA problem; we use a new representation which has not been studied for QA

Summary

Introduction

Protein folding prediction proves to be a major hurdle in modern biology (Wei and Zou 2016). While great progress has been made in computational prediction methods with the help of machine learning techniques Feng et al 2018; Chen et al 2019; Tang et al 2018; Yang et al 2018; Huang, Smolensky, et al 2018; Huang, Zhang, et al 2018; Manavalan, Basith, et al 2018; Basith et al 2018; Manavalan, Shin, et al 2018; Chen et al 2017; P.-M. While great progress has been made in computational prediction methods with the help of machine learning techniques (Manavalan et al 2017; Lai et al 2017; Peterson et al 2017; Shin, Christo er, and Kihara 2017; D. Li, Ju, and Zou 2016; Wei et al 2015; Dao et al 2018; C.-Q. Feng et al 2018; Chen et al 2019; Tang et al 2018; Yang et al 2018; Huang, Smolensky, et al 2018; Huang, Zhang, et al 2018; Manavalan, Basith, et al 2018; Basith et al 2018; Manavalan, Shin, et al 2018; Chen et al 2017; P.-M. Feng et al 2013), a long journey still remains

Methods

Results

Conclusion