Abstract

Protein functional annotation consists in associating proteins with textual descriptors elucidating their biological roles. The bulk of annotation is done via automated procedures that ultimately rely on annotation transfer. Despite a large number of existing protein annotation procedures the ever growing protein space is never completely annotated. One of the facets of annotation incompleteness derives from annotation uncertainty. Often when protein function cannot be predicted with enough specificity it is instead conservatively annotated with more generic terms. In a scenario of protein families or functionally related (or even dissimilar) sets this leads to a more difficult task of using annotations to compare the extent of functional relatedness among all family or set members. However, we postulate that identifying sub-sets of functionally coherent proteins annotated at a very specific level, can help the annotation extension of other incompletely annotated proteins within the same family or functionally related set. As an example we analyse the status of annotation of a set of CAZy families belonging to the Polysaccharide Lyase class. We show that through the use of visualization methods and semantic similarity based metrics it is possible to identify families and respective annotation terms within them that are suitable for possible annotation extension. Based on our analysis we then propose a semi-automatic methodology leading to the extension of single annotation terms within these partially annotated protein sets or families.

Highlights

  • The continuous development of high-throughput methodologies for biological molecule sequencing has led to an increase in the amount of raw biological data in need of further processing

  • Further and closer inspection of the actual annotation distribution within those families reveals that families PL5, PL16, PL17 are functionally mono-specific

  • This means that, for each of those families, there is only a single and common molecular function activity performed by their proteins

Read more

Summary

Introduction

The continuous development of high-throughput methodologies for biological molecule sequencing has led to an increase in the amount of raw biological data in need of further processing. Proteins are of particular interest given their participation in practically every process occurring within living cells Their functions can range from structural or mechanical support to the catalysis of vital metabolic biochemical reactions. The most commonplace approach is the use of any of the several function prediction methodologies, relying on techniques ranging from sequence homology detection to text mining of the scientific literature. Most of these methodologies rely heavily on computational power and can range from partial to full automation, enabling them to handle the barrage of biological sequence data currently being made available

Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.