Drop In Recall Research Articles

BackgroundEnzymatic and chemical reactions are key for understanding biological processes in cells. Curated databases of chemical reactions exist but these databases struggle to keep up with the exponential growth of the biomedical literature. Conventional text mining pipelines provide tools to automatically extract entities and relationships from the scientific literature, and partially replace expert curation, but such machine learning frameworks often require a large amount of labeled training data and thus lack scalability for both larger document corpora and new relationship types.ResultsWe developed an application of Snorkel, a weakly supervised learning framework, for extracting chemical reaction relationships from biomedical literature abstracts. For this work, we defined a chemical reaction relationship as the transformation of chemical A to chemical B. We built and evaluated our system on small annotated sets of chemical reaction relationships from two corpora: curated bacteria-related abstracts from the MetaCyc database (MetaCyc_Corpus) and a more general set of abstracts annotated with MeSH (Medical Subject Headings) term Bacteria (Bacteria_Corpus; a superset of MetaCyc_Corpus). For the MetaCyc_Corpus, we obtained 84% precision and 41% recall (55% F1 score). Extending to the more general Bacteria_Corpus decreased precision to 62% with only a four-point drop in recall to 37% (46% F1 score). Overall, the Bacteria_Corpus contained two orders of magnitude more candidate chemical reaction relationships (nine million candidates vs 68,0000 candidates) and had a larger class imbalance (2.5% positives vs 5% positives) as compared to the MetaCyc_Corpus. In total, we extracted 6871 chemical reaction relationships from nine million candidates in the Bacteria_Corpus.ConclusionsWith this work, we built a database of chemical reaction relationships from almost 900,000 scientific abstracts without a large training set of labeled annotations. Further, we showed the generalizability of our initial application built on MetaCyc documents enriched with chemical reactions to a general set of articles related to bacteria.

Read full abstract

In BioNLP-ST 2013We participated in the BioNLP 2013 shared tasks on event extraction. Our extraction method is based on the search for an approximate subgraph isomorphism between key context dependencies of events and graphs of input sentences. Our system was able to address both the GENIA (GE) task focusing on 13 molecular biology related event types and the Cancer Genetics (CG) task targeting a challenging group of 40 cancer biology related event types with varying arguments concerning 18 kinds of biological entities. In addition to adapting our system to the two tasks, we also attempted to integrate semantics into the graph matching scheme using a distributional similarity model for more events, and evaluated the event extraction impact of using paths of all possible lengths as key context dependencies beyond using only the shortest paths in our system. We achieved a 46.38% F-score in the CG task (ranking 3rd) and a 48.93% F-score in the GE task (ranking 4th).After BioNLP-ST 2013We explored three ways to further extend our event extraction system in our previously published work: (1) We allow non-essential nodes to be skipped, and incorporated a node skipping penalty into the subgraph distance function of our approximate subgraph matching algorithm. (2) Instead of assigning a unified subgraph distance threshold to all patterns of an event type, we learned a customized threshold for each pattern. (3) We implemented the well-known Empirical Risk Minimization (ERM) principle to optimize the event pattern set by balancing prediction errors on training data against regularization. When evaluated on the official GE task test data, these extensions help to improve the extraction precision from 62% to 65%. However, the overall F-score stays equivalent to the previous performance due to a 1% drop in recall.

Read full abstract

Drop In Recall Research Articles

Articles published on Drop In Recall

Improving dictionary-based named entity recognition with deep learning.

A cardiologist-like computer-aided interpretation framework to improve arrhythmia diagnosis from imbalanced training datasets

Effect of CT reconstruction settings on the performance of a deep learning based lung nodule CAD system

Extracting chemical reactions from text using Snorkel

PointNet and geometric reasoning for detection of grape vines from single frame RGB-D data in outdoor conditions

ArtiFuse-computational validation of fusion gene detection tools without relying on simulated reads.

Optimizing graph-based patterns to extract biomedical events from the literature.

The Lingering Effects of Tobacco Control Advertising

Discrimination of pop‐up conditions according to PC usage situation for dissemination of business information

Changes in retrograde memory following temporal lobectomy

The effect of changing from one to two views at incident (subsequent) screens in the NHS breast screening programme in England: impact on cancer detection and recall rates

The "memory cliff" beyond span in immediate recall.

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Drop In Recall Research Articles

Articles published on Drop In Recall

Improving dictionary-based named entity recognition with deep learning.

A cardiologist-like computer-aided interpretation framework to improve arrhythmia diagnosis from imbalanced training datasets

Effect of CT reconstruction settings on the performance of a deep learning based lung nodule CAD system

Extracting chemical reactions from text using Snorkel

PointNet and geometric reasoning for detection of grape vines from single frame RGB-D data in outdoor conditions

ArtiFuse-computational validation of fusion gene detection tools without relying on simulated reads.

Optimizing graph-based patterns to extract biomedical events from the literature.

The Lingering Effects of Tobacco Control Advertising

Discrimination of pop‐up conditions according to PC usage situation for dissemination of business information

Changes in retrograde memory following temporal lobectomy

The effect of changing from one to two views at incident (subsequent) screens in the NHS breast screening programme in England: impact on cancer detection and recall rates

The "memory cliff" beyond span in immediate recall.