Standardization of brain MR images across machines and protocols: bridging the gap for MRI-based radiomics

Alexandre Carré,Frédéric Dhermain,Nikos Paragios,Johan Pallud,Eric Deutsch,Emilie Alvarez Andres,Charlotte Robert,Maria Vakalopoulou,Théo Estienne,Guillaume Klausner,Catherine Oppenheim,Marvin Lerousseau,Enzo Battistella,Stéphane Niyoteka,Roger Sun,Samy Ammari,Sylvain Reuzé,Myriam Edjlali,Jade Briend-Diop

doi:10.1038/s41598-020-69298-z

Abstract

Radiomics relies on the extraction of a wide variety of quantitative image-based features to provide decision support. Magnetic resonance imaging (MRI) contributes to the personalization of patient care but suffers from being highly dependent on acquisition and reconstruction parameters. Today, there are no guidelines regarding the optimal pre-processing of MR images in the context of radiomics, which is crucial for the generalization of published image-based signatures. This study aims to assess the impact of three different intensity normalization methods (Nyul, WhiteStripe, Z-Score) typically used in MRI together with two methods for intensity discretization (fixed bin size and fixed bin number). The impact of these methods was evaluated on first- and second-order radiomics features extracted from brain MRI, establishing a unified methodology for future radiomics studies. Two independent MRI datasets were used. The first one (DATASET1) included 20 institutional patients with WHO grade II and III gliomas who underwent post-contrast 3D axial T1-weighted (T1w-gd) and axial T2-weighted fluid attenuation inversion recovery (T2w-flair) sequences on two different MR devices (1.5 T and 3.0 T) with a 1-month delay. Jensen–Shannon divergence was used to compare pairs of intensity histograms before and after normalization. The stability of first-order and second-order features across the two acquisitions was analysed using the concordance correlation coefficient and the intra-class correlation coefficient. The second dataset (DATASET2) was extracted from the public TCIA database and included 108 patients with WHO grade II and III gliomas and 135 patients with WHO grade IV glioblastomas. The impact of normalization and discretization methods was evaluated based on a tumour grade classification task (balanced accuracy measurement) using five well-established machine learning algorithms. Intensity normalization highly improved the robustness of first-order features and the performances of subsequent classification models. For the T1w-gd sequence, the mean balanced accuracy for tumour grade classification was increased from 0.67 (95% CI 0.61–0.73) to 0.82 (95% CI 0.79–0.84, P = .006), 0.79 (95% CI 0.76–0.82, P = .021) and 0.82 (95% CI 0.80–0.85, P = .005), respectively, using the Nyul, WhiteStripe and Z-Score normalization methods compared to no normalization. The relative discretization makes unnecessary the use of intensity normalization for the second-order radiomics features. Even if the bin number for the discretization had a small impact on classification performances, a good compromise was obtained using the 32 bins considering both T1w-gd and T2w-flair sequences. No significant improvements in classification performances were observed using feature selection. A standardized pre-processing pipeline is proposed for the use of radiomics in MRI of brain tumours. For models based on first- and second-order features, we recommend normalizing images with the Z-Score method and adopting an absolute discretization approach. For second-order feature-based signatures, relative discretization can be used without prior normalization. In both cases, 32 bins for discretization are recommended. This study may pave the way for the multicentric development and validation of MR-based radiomics biomarkers.

Highlights

Using the 32 bins considering both T1w-gd and T2w-flair sequences
Even though these three types of pre-processing of brain Magnetic resonance imaging (MRI) are widely accepted by the community, there is no consensus within radiomics studies regarding the applied image normalization method (Table 1)
We focused on three normalization methods that were selected for their representativeness within current radiomics studies (Nyul, WhiteStripe and Z-Score)

Summary

Introduction

Using the 32 bins considering both T1w-gd and T2w-flair sequences. No significant improvements in classification performances were observed using feature selection. A large variability in image intensities among inter-patient and intra-patient acquisitions exists that could highly affect the extraction of the radiomics features, compromising the pooling and the reproducibility of published data using independent imaging sets[6,7] To solve this problem, previous radiomics studies have focused on image pre-processing techniques. Brain extraction is mandatory to remove the skull regions that generate the most important variations in intensities and to define the region in which intensities should be considered before any image intensity normalization[13,14] Even though these three types of pre-processing of brain MRI are widely accepted by the community, there is no consensus within radiomics studies regarding the applied image normalization method (Table 1). The Z-Score method consists of subtracting the mean intensity of the entire image or a region of interest from each voxel value and dividing it by the corresponding standard deviation[34]

Methods

Results

Conclusion