Rank information: A structure‐independent measure of evolutionary trace quality that improves identification of protein functional sites

Hui Yao,Ivana Mihalek,Olivier Lichtarge

doi:10.1002/prot.21101

Abstract

Protein functional sites are key targets for drug design and protein engineering, but their large-scale experimental characterization remains difficult. The evolutionary trace (ET) is a computational approach to this problem that has been useful in a variety of case studies, but its proteomic scale application is partially hindered because automated retrieval of input sequences from databases often includes some with errors that degrade functional site identification. To recognize and purge these sequences, this study introduces a novel and structure-free measure of ET quality called rank information (RI). It is shown that RI decreases in response to errors in sequences, alignments, or functional classifications. Conversely, an automated procedure to increase RI by selectively removing sequences improves functional site identification so as to nearly match manually curated traces in kinases and in a test set of 79 diverse proteins. Thus we conclude that RI partially reflects the evolutionary consistency of sequence, structure, and function. In practice, as the size of the proteome continues to grow exponentially, it provides a novel and structure-free measure of ET quality that increases its accuracy for large-scale automated annotation of protein functional sites.

Full Text