Abstract

We describe a large scale application of a back-propagation neural network to the analysis, classification and prediction of protein secondary and tertiary structure from sequence information alone. A back-propagation network called BigNet has been implemented along with a Network Description Language (NDL) on the 512 MWord Cray 2 at the Minnesota Supercomputer Center. The proof-of-concept experiments described here used a small, heterologous training set of small protein structures (15 proteins each with less than 133 residues) from the Brookhaven Protein Data Bank (PDB). Simulations with one hidden layer and one half to ten million connections execute at three to five million connection updates per second in full back-propagation learning mode and routinely converge to solutions where input of hydrophobicity-coded sequence yields output distance matrices with 0.3 to 1.5% RMS deviation from actual distance matrices. Although the training set used is too small to expect useful generalization, some evidence of generalization was evident in similarity of learning progress of homologous pairs within the training set and in production of novel distance matrix outputs upon presentation with novel input sequences. The discussion addresses limitations in the current implementation, plans for software improvements, and characteristics of future training sets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call