Data-driven machine learning in medical research and diagnostics needs large-scale datasets curated by clinical experts. The generation of large datasets can be challenging in terms of resource consumption and time effort, while generalizability and validation of the developed models significantly benefit from variety in data sources. Training algorithms on smaller decentralized datasets through federated learning can reduce effort, but require the implementation of a specific and ambitious infrastructure to share data, algorithms and computing time. Additionally, it offers the opportunity of maintaining and keeping the data locally. Thus, data safety issues can be avoided because patient data must not be shared. Machine learning models are trained on local data by sharing the model and through an established network. In addition to commercial applications, there are also numerous academic and customized implementations of network infrastructures available. The configuration of these networks primarily differs, yet adheres to a standard framework composed of fundamental components. In this technical note, we propose basic infrastructure requirements for data governance, data science workflows, and local node set-up, and report on the advantages and experienced pitfalls in implementing the local infrastructure with the German Radiological Cooperative Network initiative as the use case example. We show how the infrastructure can be built upon some base components to reflect the needs of a federated learning network and how they can be implemented considering both local and global network requirements. After analyzing the deployment process in different settings and scenarios, we recommend integrating the local node into an existing clinical IT infrastructure. This approach offers benefits in terms of maintenance and deployment effort compared to external integration in a separate environment (e.g., the radiology department). This proposed groundwork can be taken as an exemplary development guideline for future applications of federated learning networks in clinical and scientific environments.
Read full abstract