Abstract

BackgroundAdvances in high-throughput technologies available to modern biology have created an increasing flood of experimentally determined facts. Ordering, managing and describing these raw results is the first step which allows facts to become knowledge. Currently there are limited ways to automatically annotate such data, especially utilizing information deposited in published literature.ResultsTo aid researchers in describing results from high-throughput experiments we developed HT-SAS, a web service for automatic annotation of proteins using general English words. For each protein a poll of Medline abstracts connected to homologous proteins is gathered using the UniProt-Medline link. Overrepresented words are detected using binomial statistics approximation. We tested our automatic approach with a protein test set from SGD to determine the accuracy and usefulness of our approach. We also applied the automatic annotation service to improve annotations of proteins from Plasmodium bergei expressed exclusively during the blood stage.ConclusionUsing HT-SAS we created new, or enriched already established annotations for over 20% of proteins from Plasmodium bergei expressed in the blood stage, deposited in PlasmoDB. Our tests show this approach to information extraction provides highly specific keywords, often also when the number of abstracts is limited. Our service should be useful for manual curators, as a complement to manually curated information sources and for researchers working with protein datasets, especially from poorly characterized organisms.

Highlights

  • Advances in high-throughput technologies available to modern biology have created an increasing flood of experimentally determined facts

  • BMC Bioinformatics 2009, 10:148 http://www.biomedcentral.com/1471-2105/10/148 ized into three hierarchies: biological process, molecular function and cellular component. These GO terms are utilized by e.g. the GOA project [4] that provides Gene Ontology annotations for the UniProt database, International Protein Index (IPI) and other major databases such as Ensembl and NCBI

  • Evaluation by precision and recall We have conducted test experiments to assess the quality and usefulness of keywords obtained by HT-SAS

Read more

Summary

Results

Evaluation by precision and recall We have conducted test experiments to assess the quality and usefulness of keywords obtained by HT-SAS. Case study: Annotation of genes expressed exclusively during blood stages of Plasmodium bergei To further assess the usefulness of HT-SAS service we conducted a test analysis of a protein set from a poorly described organism. This would mimic one of the major applications of HT-SAS – a service designed to enhance information associated with protein sequences deposited in databases. This result shows that for over 20% of proteins in this set HT-SAS was able to provide information which was sufficient to augment existing annotations or create novel ones. This demonstrates the usefulness of our approach to manual curators and to regular users which can test whether HT-SAS can augment annotations of their protein(s)

Conclusion
Background
Consortium U
11. Ruch P
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.