Abstract

AbstractTen years ago WormBase^1^ started as a repository for sequence data for the modelorganism Caenorhabditis elegans and has since striven to include the curation of allgenetic and molecular data published for this nematode. With a publication rate in the C.elegans field of approximately 800 papers per year, WormBase (WB) has the opportunity to include information from every paper published. Currently there are ~11,000 full text research papers (mid-1970's to the present) downloaded into the WB curation database, from which over 27 data types (i.e. genetic interactions, transgene objects, gene expression patterns, mutant phenotypes etc.) are extracted by curators. Textpresso^2^ is an open source text-mining tool capable of rapid searches for keywords, as well as concepts, from the full text of research papers. Curators at WB use Textpresso on a daily basis for many aspects of literature curation, from simple keyword searches to semi- or fully automated entity and fact extraction, which feed into curation pipelines or directly into the curation database itself. In addition, Textpresso greatly aids prioritization of literature curation by retrieving papers based on their full contents rather than solely on their abstracts. Such retrievable contents can range from the very particular (such as a gene simply being mentioned in the Materials and Methods section of a paper) to the complex (such as molecular functions that involve cellular components). As WB expands to incorporate the genomes of other nematodes, we will be working with Textpresso developers to set up a library of literature for related nematodes. We expect Textpresso to be crucial for most efficiently directing our efforts in literature curation, and for most quickly providing data to users searching the literature. In this workshop we will show how we use Textpresso in our curation pipeline to help with literature queries, to prioritize our workflow, and to automate data and fact extraction.1 "WormBase":http://www.wormbase.org2 "Textpresso":http://www.textpresso.org

Highlights

  • WormBase: a collaborative effort for genetics, genomics and biology of C. elegans and some related nematodes

  • “...all wild-type embryos have foci of GFP:SAS-4 associated with both sperm centrioles...“

  • GFP:: RAB-7 is visualized on the membranes of late endosomes srsx-3 expression was present in young odr-1 larvae but absent in odr-1 adults

Read more

Summary

Why incorporate text mining into our curation pipeline?

WormBase curators curate >25 different types of data, but there are still data types we don’t yet curate. For some data types we do curate, we have a backlog older papers

Site of Action
RNAi terms RNAi
Antibody verbs Antibody terms
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call