Making head and neck cancer clinical data Findable-Accessible-Interoperable-Reusable to support multi-institutional collaboration and federated learning

Varsha Gouthamchand,Frederik Wesseling,Ananya Choudhury,Leonard Wee,Sejin Kim,Benjamin Haibe-Kains,Frank Hoebers,Mattea Welch,Joanna Kazmierska,Johan Van Soest,Andre Dekker

doi:10.1093/bjrai/ubae005

Varsha Gouthamchand, Frederik Wesseling + Show 9 more

Open Access

PDF Available

https://doi.org/10.1093/bjrai/ubae005

Copy DOI

Export

Save

Cite

Journal: BJR\|Artificial Intelligence	Publication Date: Mar 4, 2024
License type: CC BY 4.0

Abstract
Full-Text PDF
Similar Papers

Abstract

Listen

Abstract Objectives Federated learning (FL) is a group of methodologies where statistical modelling can be performed without exchanging identifiable patient data between cooperating institutions. To realize its potential for AI development on clinical data, a number of bottlenecks need to be addressed. One of these is making data Findable-Accessible-Interoperable-Reusable (FAIR). The primary aim of this work is to show that tools making data FAIR allow consortia to collaborate on privacy-aware data exploration, data visualization, and training of models on each other’s original data. Methods We propose a “Schema-on-Read” FAIR-ification method that adapts for different (re)analyses without needing to change the underlying original data. The procedure involves (1) decoupling the contents of the data from its schema and database structure, (2) annotation with semantic ontologies as a metadata layer, and (3) readout using semantic queries. Open-source tools are given as Docker containers to help local investigators prepare their data on-premises. Results We created a federated privacy-preserving visualization dashboard for case mix exploration of 5 distributed datasets with no common schema at the point of origin. We demonstrated robust and flexible prognostication model development and validation, linking together different data sources—clinical risk factors and radiomics. Conclusions Our procedure leads to successful (re)use of data in FL-based consortia without the need to impose a common schema at every point of origin of data. Advances in knowledge This work supports the adoption of FL within the healthcare AI community by sharing means to make data more FAIR.

Full Text