Abstract

Finding mentions of chemical names in texts is of huge interest due to its importance in wide-spread application areas. The inherent complex structures of chemical names and the existence of several representations and nomenclatures (like SMILES, InChI, IUPAC) pose a big challenge to their automatic identification and classification. In this paper we present a supervised machine learning approach based on Conditional Random Fields (CRF) to find mentions of IUPAC and IUPAC-like names in scientific text. We identify and implement a very rich feature set for the task without using any domain specific knowledge and/or resources. Experiments are carried out on the benchmark MEDLINE datasets. Evaluation shows encouraging performance with the overall recall, precision and F-measure values of 90.96%, 91.52% and 91.23%, respectively. We also present the scope of comparison to the existing state-of-the-art system(s).

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call