Reliability assessment of tissue classification algorithms for multi-center and multi-scanner data

Mahsa Dadar,Simon Duchesne

doi:10.1016/j.neuroimage.2020.116928

Abstract

BackgroundGray and white matter volume difference and change are important imaging markers of pathology and disease progression in neurology and psychiatry. Such measures are usually estimated from tissue segmentation maps produced by publicly available image processing pipelines. However, the reliability of the produced segmentations when using multi-center and multi-scanner data remains understudied. Here, we assess the robustness of six publicly available tissue classification pipelines across images acquired from different MR scanners and sites. MethodsWe used 90 T1-weighted images of a single individual, scanned in 73 sessions across 27 different sites to assess the robustness of the tissue classification tools. Variability in Dice similarity index values and tissue volumes was assessed for Atropos, BISON, Classify_Clean, FAST, FreeSurfer, and SPM12. ResultsBISON had the highest overall Dice coefficient for GM, followed by SPM12 and Atropos; while Atropos had the highest overall Dice coefficient for WM, followed by BISON and SPM12. BISON had the lowest overall variability in its volumetric estimates, followed by FreeSurfer, and SPM12. All methods also had significant differences between some of their estimates across different scanner manufacturers (e.g. BISON had significantly higher GM estimates and correspondingly lower WM estimates for GE scans compared to Philips and Siemens), and different signal-to-noise ratio (SNR) levels (e.g. FAST and FreeSurfer had significantly higher WM volume estimates for high versus medium and low SNR tertiles as well as correspondingly lower GM volume estimates). ConclusionsOur comparisons provide a benchmark on the reliability of the publicly used tissue classification techniques and the amount of variability that can be expected when using large multi-center and multi-scanner databases.

Full Text