Abstract
BackgroundThe biomedical literature continues to grow at a rapid pace, making the challenge of knowledge retrieval and extraction ever greater. Tools that provide a means to search and mine the full text of literature thus represent an important way by which the efficiency of these processes can be improved.ResultsWe describe the next generation of the Textpresso information retrieval system, Textpresso Central (TPC). TPC builds on the strengths of the original system by expanding the full text corpus to include the PubMed Central Open Access Subset (PMC OA), as well as the WormBase C. elegans bibliography. In addition, TPC allows users to create a customized corpus by uploading and processing documents of their choosing. TPC is UIMA compliant, to facilitate compatibility with external processing modules, and takes advantage of Lucene indexing and search technology for efficient handling of millions of full text documents.Like Textpresso, TPC searches can be performed using keywords and/or categories (semantically related groups of terms), but to provide better context for interpreting and validating queries, search results may now be viewed as highlighted passages in the context of full text. To facilitate biocuration efforts, TPC also allows users to select text spans from the full text and annotate them, create customized curation forms for any data type, and send resulting annotations to external curation databases. As an example of such a curation form, we describe integration of TPC with the Noctua curation tool developed by the Gene Ontology (GO) Consortium.ConclusionTextpresso Central is an online literature search and curation platform that enables biocurators and biomedical researchers to search and mine the full text of literature by integrating keyword and category searches with viewing search results in the context of the full text. It also allows users to create customized curation interfaces, use those interfaces to make annotations linked to supporting evidence statements, and then send those annotations to any database in the world.Textpresso Central URL: http://www.textpresso.org/tpc
Highlights
The biomedical literature continues to grow at a rapid pace, making the challenge of knowledge retrieval and extraction ever greater
Using the full text of C. elegans research papers, we demonstrated the increased accuracy of searching text using a combination of categories from the Textpresso ontology and words or phrases [12]
Because they identify groups of semantically meaningful terms, categories can be used for information extraction in a semi-automated manner, speeding up, and helping to improve sustainability of, curation tasks in literature-based information resources, such as the Model Organism Databases (MODs) [7, 13]
Summary
Textpresso central searches Like the original Textpresso, Textpresso Central allows for diverse modes of searching the literature, from simple keyword searches to well-defined, targeted searches that seek to answer specific biological questions. The curator first logs into the Noctua annotation tool, navigates to the Paper Markup Tools section of the Edit annotations feature in the Models menu, and clicks on the Textpresso Central (TPC) link Clicking on this link directs curators to the Textpresso Central homepage, where they can login and perform the relevant search; in this case, the search is limited to the C. elegans corpus and consists of the keyword ‘zyg-1’, and the categories ‘Enzymatic Activity’ and ‘tables and figures’ and their child terms. The latter category is included to restrict matching sentences to those that reference a table or figure in the associated paper. As long as the API of the external database is in the form of parameterized URIs or posts in JSON format, there is no additional configuration necessary on the Textpresso Central site
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.