Automatic identification of recent high impact clinical articles in PubMed to support clinical decision making using time-agnostic features.

Jiantao Bian,Samir Abdelrahman,Guilherme Del Fiol,Jianlin Shi

doi:10.1016/j.jbi.2018.11.010

Jiantao Bian, Samir Abdelrahman + Show 2 more

Open Access

https://doi.org/10.1016/j.jbi.2018.11.010

Copy DOI

Abstract

Finding recent clinical studies that warrant changes in clinical practice ("high impact" clinical studies) in a timely manner is very challenging. We investigated a machine learning approach to find recent studies with high clinical impact to support clinical decision making and literature surveillance. To identify recent studies, we developed our classification model using time-agnostic features that are available as soon as an article is indexed in PubMed®, such as journal impact factor, author count, and study sample size. Using a gold standard of 541 high impact treatment studies referenced in 11 disease management guidelines, we tested the following null hypotheses: (1) the high impact classifier with time-agnostic features (HI-TA) performs equivalently to PubMed's Best Match sort and a MeSH-based Naïve Bayes classifier; and (2) HI-TA performs equivalently to the high impact classifier with both time-agnostic and time-sensitive features (HI-TS) enabled in a previous study. The primary outcome for both hypotheses was mean top 20 precision. The differences in mean top 20 precision between HI-TA and three baselines (PubMed's Best Match, a MeSH-based Naïve Bayes classifier, and HI-TS) were not statistically significant (12% vs. 3%, p = 0.101; 12% vs. 11%, p = 0.720; 12% vs. 25%, p = 0.094, respectively). Recall of HI-TA was low (7%). HI-TA had equivalent performance to state-of-the-art approaches that depend on time-sensitive features. With the advantage of relying only on time-agnostic features, the proposed approach can be used as an adjunct to help clinicians identify recent high impact clinical studies to support clinical decision-making. However, low recall limits the use of HI-TA for literature surveillance.

Full Text