Avoiding background knowledge: literature based discovery from important information

Judita Preiss

doi:10.1186/s12859-022-04892-8

Abstract

BackgroundAutomatic literature based discovery attempts to uncover new knowledge by connecting existing facts: information extracted from existing publications in the form of A rightarrow B and B rightarrow C relations can be simply connected to deduce A rightarrow C. However, using this approach, the quantity of proposed connections is often too vast to be useful. It can be reduced by using subjectrightarrow(predicate)rightarrowobject triples as the A rightarrow B relations, but too many proposed connections remain for manual verification.ResultsBased on the hypothesis that only a small number of subject–predicate–object triples extracted from a publication represent the paper’s novel contribution(s), we explore using BERT embeddings to identify these before literature based discovery is performed utilizing only these, important, triples. While the method exploits the availability of full texts of publications in the CORD-19 dataset—making use of the fact that a novel contribution is likely to be mentioned in both an abstract and the body of a paper—to build a training set, the resulting tool can be applied to papers with only abstracts available. Candidate hidden knowledge pairs generated from unfiltered triples and those built from important triples only are compared using a variety of timeslicing gold standards.ConclusionsThe quantity of proposed knowledge pairs is reduced by a factor of 10^3, and we show that when the gold standard is designed to avoid rewarding background knowledge, the precision obtained increases up to a factor of 10. We argue that the gold standard needs to be carefully considered, and release as yet undiscovered candidate knowledge pairs based on important triples alongside this work.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: BMC Bioinformatics	Publication Date: Mar 14, 2023
Citations: 1	License type: open-access

R Discovery Prime

R Discovery Prime

Avoiding background knowledge: literature based discovery from important information

Abstract

Talk to us

Similar Papers

More From: BMC Bioinformatics

Lead the way for us

Similar Papers

Are RCTs the Gold Standard?
Nancy Cartwright
BioSocieties | VOL. 2
Nancy CartwrightNancy Cartwright
01 Mar 2007
BioSocieties | VOL. 2

Abstract PS2-29: Artificial intelligence-assisted interpretation of Ki-67 expression and repeatability in breast cancer
Lina Li ... Yueping Liu
Cancer Research | VOL. 81
Lina Li, et. al.Lina Li ... Yueping Liu
15 Feb 2021
Abstract PS2-29: Artificial intelligence-assisted interpretation of Ki-67 expression and repeatability in breast cancer
Lina Li ... Yueping Liu

Concordance rate in cattle and sheep between genotypes differing in Illumina GenCall quality score.
D P Berry ... A C O'Brien
Animal genetics | VOL. 52
D P Berry, et. al.D P Berry ... A C O'Brien
01 Feb 2021
Animal genetics | VOL. 52

The WDC Gold Standards for Product Feature Extraction and Product Matching
Petar Petrovski ... Robert Meusel
-
Petar Petrovski, et. al.Petar Petrovski ... Robert Meusel
01 Jan 2017
01 Jan 2017

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Avoiding background knowledge: literature based discovery from important information

Abstract

Talk to us

Similar Papers

More From: BMC Bioinformatics