Abstract

Over the past two decades, advances in rapid genomic sequencing have massively expanded the space of known protein sequences, reaching more than 200 million sequences in recent versions of the TrEMBL database. Making optimal use of this wealth of sequencing information, however, requires considerable advances in our ability to predict biomolecular function from sequence information. While computational function predictions based purely on sequence similarity are highly effective when amino acid sequence identity is high, such predictors struggle when no high-identity annotation templates are available. In such cases, we and others have shown that the integration of additional information such as expression profiles and protein structure can substantially improve annotation accuracy.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call