Abstract

BackgroundPublication databases in biomedicine (e.g., PubMed, MEDLINE) are growing rapidly in size every year, as are public databases of experimental biological data and annotations derived from the data. Publications often contain evidence that confirm or disprove annotations, such as putative protein functions, however, it is increasingly difficult for biologists to identify and process published evidence due to the volume of papers and the lack of a systematic approach to associate published evidence with experimental data and annotations. Natural Language Processing (NLP) tools can help address the growing divide by providing automatic high-throughput detection of simple terms in publication text. However, NLP tools are not mature enough to identify complex terms, relationships, or events.ResultsIn this paper we present and extend BioDEAL, a community evidence annotation system that introduces a feedback loop into the database-publication cycle to allow scientists to connect data-driven biological concepts to publications.ConclusionBioDEAL may change the way biologists relate published evidence with experimental data. Instead of biologists or research groups searching and managing evidence independently, the community can collectively build and share this knowledge.

Highlights

  • Over the past decade, systems biology research has undergone two key transformations

  • The Publication panel contains the publication text, for example, from PubMed or MEDLINE; BioDEAL supports both PDF and text (HTML, PHP, etc.) documents. The former is typically a full publication identified by its Uniform Resource Locator (URL) on the journal web site, while the latter may be an abstract from PubMed

  • BioDEAL can present annotations generated by external projects such as BioCreAtIvE [9,18], whose overarching goal is to enhance abstracts with annotations

Read more

Summary

Introduction

Systems biology research has undergone two key transformations. Public databases of experimentally generated -omics data are increasing in number, size and diversity, along with annotations predicted from these data by computational tools. Such annotations may include the predicted protein functions as part of genome annotation pipelines, the predicted high resolution 3-dimensional structures of proteins from amino acid sequence information alone, the predicted protein-protein interactions and interaction networks derived from databases of yeast-2-hybrid, or mass spectrometry pull-down experiments. There are currently over 20 million scientific abstracts in MEDLINE, growing at 500,000 articles per year [1] Such articles often report the discovered evidence (e.g., mutagenesis experiments) for various hypotheses derived via mining these heterogeneous databases of publicly available data and annotations. NLP tools are not mature enough to identify complex terms, relationships, or events

Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.