Abstract

Over the past two decades, advances in rapid genomic sequencing have massively expanded the space of known protein sequences, reaching more than 200 million sequences in recent versions of the TrEMBL database. Making optimal use of this wealth of sequencing information, however, requires considerable advances in our ability to predict biomolecular function from sequence information. While computational function predictions based purely on sequence similarity are highly effective when amino acid sequence identity is high, such predictors struggle when no high-identity annotation templates are available. In such cases, we and others have shown that the integration of additional information such as expression profiles and protein structure can substantially improve annotation accuracy.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.