Abstract

BackgroundThe Enteropathogen Resource Integration Center (ERIC; ) has a goal of providing bioinformatics support for the scientific community researching enteropathogenic bacteria such as Escherichia coli and Salmonella spp. Rapid and accurate identification of experimental conclusions from the scientific literature is critical to support research in this field. Natural Language Processing (NLP), and in particular Information Extraction (IE) technology, can be a significant aid to this process.DescriptionWe have trained a powerful, state-of-the-art IE technology on a corpus of abstracts from the microbial literature in PubMed to automatically identify and categorize biologically relevant entities and predicative relations. These relations include: Genes/Gene Products and their Roles; Gene Mutations and the resulting Phenotypes; and Organisms and their associated Pathogenicity. Evaluations on blind datasets show an F-measure average of greater than 90% for entities (genes, operons, etc.) and over 70% for relations (gene/gene product to role, etc). This IE capability, combined with text indexing and relational database technologies, constitute the core of our recently deployed text mining application.ConclusionOur Text Mining application is available online on the ERIC website . The information retrieval interface displays a list of recently published enteropathogen literature abstracts, and also provides a search interface to execute custom queries by keyword, date range, etc. Upon selection, processed abstracts and the entities and relations extracted from them are retrieved from a relational database and marked up to highlight the entities and relations. The abstract also provides links from extracted genes and gene products to the ERIC Annotations database, thus providing access to comprehensive genomic annotations and adding value to both the text-mining and annotations systems.

Highlights

  • The Enteropathogen Resource Integration Center (ERIC; http://www.ericbrc.org) has a goal of providing bioinformatics support for the scientific community researching enteropathogenic bacteria such as Escherichia coli and Salmonella spp

  • The Enteropathogen Resource Integration Center (ERIC, http://www.ericbrc.org) [6,7] is one of eight Bioinformatics Resource Centers (BRCs) for Biodefense and Emerging/Re-Emerging Infectious Diseases http://www.brccentral.org/ funded by the National Institute of Allergy and Infectious Diseases (NIAID; http:// www3.niaid.nih.gov/)

  • ERIC serves as an information resource for enterobacteria from four genera on the NIAID list of select agents related to biodefense – Escherichia, Shigella, Salmonella, and Yersinia

Read more

Summary

Introduction

The Enteropathogen Resource Integration Center (ERIC; http://www.ericbrc.org) has a goal of providing bioinformatics support for the scientific community researching enteropathogenic bacteria such as Escherichia coli and Salmonella spp. Description: We have trained a powerful, state-of-the-art IE technology on a corpus of abstracts from the microbial literature in PubMed to automatically identify and categorize biologically relevant entities and predicative relations These relations include: Genes/Gene Products and their Roles; Gene Mutations and the resulting Phenotypes; and Organisms and their associated Pathogenicity. At the heart of the system is ASAP, A Systematic Annotation Package for community analysis of genomes [8,9], providing its users with a database of high-quality annotations backed by evidence codes This is achieved through the efforts of a dedicated team of annotators employing both manual examination of the experimental literature and automatic annotation methods

Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.