Comparing manual and automated extraction of chemical entities from documents

Christian Tyrchan,Sorel Muresan

doi:10.1186/1758-2946-2-s1-p7

Abstract

The chemical information landscape is changing rapidly with a yearly increase of over 1 million new compounds and more than 700,000 publications related to chemistry [1]. Exploring the chemical space covered by relevant journals and patents is a crucial step in early stage medicinal chemistry projects. Extracting chemical entities from unstructured text is a complex task and different approaches are currently used including manual extraction by expert curators, text mining supported by chemical NER or combinations thereof [2]. The chemical information and corresponding annotations are subsequently stored in relational databases allowing for complex chemical and text queries. To assess the capability of chemical NER in documents and to understand the coverage and accuracy of the underlying data we compared the chemistry extracted by manual curation (GVKBIO) and text mining (SureChem) from a small patent corpus. • GVKBIO databases are populated with explicit relationships between compounds, assays and sequence identifiers that have been manually extracted from journals and patents on a large scale [3]. • SureChem Portal [4] is a gateway for chemical patent search on full text collections for USPTO, EPO and WO. SureChem users can perform structure and keyword searches on more than 9 million unique compounds. We have selected a set of 250 patents covering various target classes and for which a minimum of 25 records per patents were retrieved from GVKBIO Patent database. The analysis was done using PipelinePilot protocols [5]. These initial results demonstrate the benefits and challenges of text mining for chemical information extraction from unstructured text.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Journal of Cheminformatics	Publication Date: May 1, 2010
Citations: 2	License type: CC BY-NC 2.0

R Discovery Prime

R Discovery Prime

Comparing manual and automated extraction of chemical entities from documents

Abstract

Talk to us

Similar Papers

More From: Journal of Cheminformatics

Lead the way for us

Similar Papers

Text-Mining to Identify Gene Sets Involved in Biocorrosion by Sulfate-Reducing Bacteria: A Semi-Automated Workflow.
Payal Thakur ... Rajesh Kumar Sani
Microorganisms | VOL. 11
Payal Thakur, et. al.Payal Thakur ... Rajesh Kumar Sani
03 Jan 2023
Microorganisms | VOL. 11

Challenges in mining the literature for chemical information
Harsha Gurulingappa ... Luca Toldo
RSC Advances | VOL. 3
Harsha Gurulingappa, et. al.Harsha Gurulingappa ... Luca Toldo
01 Jan 2013
RSC Advances | VOL. 3

Proceedings of the 2013 international workshop on Mining unstructured big data using natural language processing
...
-
, et. al. ...
28 Oct 2013
Proceedings of the 2013 international workshop on Mining unstructured big data using natural language processing
...

An Ontology-Based Text Mining Method to Develop D-Matrix From Unstructured Text
Dnyanesh G Rajpathak ... Satnam Singh
IEEE Transactions on Systems, Man, and Cybernetics: Systems | VOL. 44
Dnyanesh G Rajpathak, et. al.Dnyanesh G Rajpathak ... Satnam Singh
01 Jul 2014
IEEE Transactions on Systems, Man, and Cybernetics: Systems | VOL. 44

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Comparing manual and automated extraction of chemical entities from documents

Abstract

Talk to us

Similar Papers

More From: Journal of Cheminformatics