A federated cloud architecture for processing of cancer images on a distributed storage

J Damián Segrelles Quilis,Sergio López-Huguet,Pau Lozano,Ignacio Blanquer

doi:10.1016/j.future.2022.09.019

Abstract

The increased accuracy and exhaustivity of modern Artificial Intelligence techniques in supporting the analysis of complex data, such as medical images, have exponentially increased real-world data collection for research purposes. This fact has led to the development of international repositories and high-performance computing solutions to deal with the computational demand for training models. However, other stages in the development of medical imaging biomarkers do not require such intensive computing resources, which has led to the convenience of integrating different computing backends tailored for the processing demands of the various stages of processing workflows. We present in this article a distributed and federated repository architecture for the development and application of medical image biomarkers that combines multiple cloud storages with cloud and HPC processing backends. The architecture has been deployed to serve the PRIMAGE (H2020 826494) project, aiming to collect and manage data from paediatric cancer. The repository seamlessly integrates distributed storage backends, an elastic Kubernetes cluster on a cloud on-premises and a supercomputer. Processing jobs are handled through a single control platform, synchronising data on demand. The article shows the specification of the different types of applications and a validation through a use case that make use of most of the features of the platform.

Full Text