Abstract

Accurate determination of protein secondary structure from the chemical shift information is a key step for NMR tertiary structure determination. Relatively few work has been done on this subject. There needs to be a systematic investigation of algorithms that are (a) robust for large datasets; (b) easily extendable to (the dynamic) new databases; and (c) approaching to the limit of accuracy. We introduce new approaches using k-nearest neighbor algorithm to do the basic prediction and use the BCJR algorithm to smooth the predictions and combine different predictions from chemical shifts and based on sequence information only. Our new system, SUCCES, improves the accuracy of all existing methods on a large dataset of 805 proteins (at 86% Q(3) accuracy and at 92.6% accuracy when the boundary residues are ignored), and it is easily extendable to any new dataset without requiring any new training. The software is publicly available at http://monod.uwaterloo.ca/nmr/succes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call