Abstract
There is a pressing need for the automated extraction of chemical reaction information because of the rapid growth of scientific documents. The previously reported works in the literature for the procedure extraction either (a) did not consider the semantic relations between the action and argument or (b) defined a detailed schema for the extraction. The former method was insufficient for reproducing the reaction, while the latter methods were too specific to their own schema and did not consider the general semantic relation between the verb and argument. In addition, they did not provide an annotated text that aligned with the structured procedure. Along these lines, in this work, we propose a corpus named organic synthesis procedures with argument roles (OSPAR) that is annotated with rolesets to consider the semantic relation between the verb and argument. We also provide rolesets for chemical reactions, especially for organic synthesis, which represent the argument roles of actions in the corpus. More specifically, we annotated 112 organic synthesis procedures in journal articles from Organic Syntheses and defined 19 new rolesets in addition to 29 rolesets from an existing language resource (Proposition Bank). After that, we constructed a simple deep learning system trained on OSPAR and discussed the usefulness of the corpus by comparing it with chemical description language (XDL) generated by a natural language processing tool, namely, SynthReader. While our system's output required more detailed parsing, it covered comparable information against XDL. Moreover, we confirmed that the validation of the output action sequence was easy as it was aligned with the original text.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.