Reproducibility of Deep CNN for Biomedical Image Processing Across Frameworks and Architectures

Stefano Marrone,Gabriele Piantadosi,Carlo Sansone,Stefano Olivieri

doi:10.23919/eusipco.2019.8902690

Abstract

With the increasing spread of easy and effective frameworks, in recent years Deep Learning approaches are becoming more and more used in several application fields, including computer vision (such as natural and biomedical image processing), automatic speech recognition (ASR) and time-series analysis. If, on one hand, the availability of such frameworks allows developers to use the one they feel more comfortable with, on the other, it raises questions related to the reproducibility of the designed model across different hardware and software configurations, both at training and at inference times. The reproducibility assessment is important to determine if the resulting model produces good or bad outcomes just because of luckier or blunter environmental training conditions. This is a non-trivial problem for Deep Learning based applications, not only because their training and optimization phases strongly rely on stochastic procedures, but also because of the use of some heuristic considerations (mainly speculative procedures) at training time that, although they help in reducing the required computational effort, tend to introduce non-deterministic behavior, with a direct impact on the results and on the model’s reproducibility. Usually, to face this problem, designers make use of probabilistic considerations about the distribution of data or focus their attention on very huge datasets. However, this kind of approach does not really fit some application field standards (such as medical imaging analysis with Computer-Aided Detection and Diagnosis systems – CAD) that require strong demonstrable proofs of effectiveness and repeatability of results across the population. It is our opinion that in those cases it is of crucial importance to clarify if and to what extent a Deep Learning based application is stable and repeatable as well as effective, across different environmental (hardware and software) configurations. Therefore, the aim of this work is to quantitatively analyze the reproducibility problem of Convolutional Neural Networks (CNN) based approaches for the biomedical image processing, in order to highlight the impact that a given software framework and hardware configurations might have when facing the same problem by the same means. In particular, we analyzed the problem of breast tissue segmentation in DCE-MRI by using a modified version of a 2D U-Net CNN, a very effective deep architecture for semantic segmentation, using two Deep Learning frameworks (MATLAB and TensorFlow) across different hardware configurations.

Full Text