Background The HBCP knowledge system identifies and extracts entities from randomised controlled trials of behaviour change interventions organised by a behaviour change intervention ontology (BCIO) to populate: 1) an outcome prediction tool; and 2) a research browser tool. This knowledge system requires automated information extraction algorithms to query and interpret evidence from behaviour change intervention (BCI) reports. This paper reports the results of an evaluation of the automated information extraction and reflects on the results in relation to the challenges of interdisciplinary working and collaboration. Methods The evaluation used a dataset of 117 previously unseen BCI reports to assess its performance. The automatically extracted information was compared to the full text PDF by trained annotators on essential BCIO entities required for the outcome prediction tool and research browser tool and whether the extracted information was assigned to the correct arm of the randomised trial. Essential entities were the outcome value, a selection of the most common Behaviour Change Techniques (BCTs), the mode of intervention delivery, and key population characteristics. Results The evaluation found an outcome value present in 53.85% (n=63) of the output from the information extraction system but it never extracted both the correct outcome values (interventions and control arms) and assigned them to the correct study arms (intervention and control). Although 84.62% (n=99) of the papers contained information relevant to Behaviour Change Techniques (BCTs), the information extraction algorithm correctly extracted only 58.59% (n=58) of BCTs. Conclusions The evaluation found that the information extraction algorithm did not extract the outcome values and key BCIO entities correctly against the correct arms in any of the papers in our sample, making it unsuitable for deployment in the outcome prediction and research browser tools. Several challenges with working in interdisciplinary teams were identified and discussed along with lessons learned for future work.
Read full abstract