A multi-institute automated segmentation evaluation on a standard dataset: Findings from the international workshop on osteoarthritis imaging segmentation challenge

Arjun Desai ,M O'Brian,Drew A Torigian ,Akshay Pai ,Cem M Deniz ,Ravinder R Regatte ,Valentina Pedoia ,Io Flament ,Christian Igel ,Mathias Perslev ,Mehmet Akçakaya ,Claudia Iriondo ,Aliasghar Mortazi ,Ulaş Bağcı ,Francesco Calivà ,Vladimír Juráš ,Jutta Ellermann ,Kunio Nakamura ,Erik B Dam ,X Li ,Sibaji Gaj ,Sharmila Majumdar ,Radhika Tibrewala ,Brian Hargreaves ,Naji Khosravan ,Sachin Jambawalikar ,Garry E Gold ,Mingrui Yang ,Akshay Chaudhari

doi:10.1016/j.joca.2020.02.477

Abstract

Purpose: Changes in cartilage thickness are predictive of radiographic joint-space loss and joint arthroplasty. While manual segmentation is the gold-standard for evaluating cartilage morphology, it is time-consuming and has high inter-reader variability. Advances in deep-learning and convolutional neural networks (CNNs) are promising for automatic tissue segmentation, however, the heterogeneity of datasets used for network evaluation have limited pervasive utilization of these techniques. To address these limitations, a segmentation challenge was organized at the 2019 International Workshop on Osteoarthritis Imaging (IWOAI). Here, we summarize the challenge submissions and discuss efficacy of diverse, multi-institutional deep-learning approaches for segmenting knee cartilage and meniscus. Methods: For the challenge, six teams trained CNNs to segment femoral cartilage, tibial cartilage, patellar cartilage, and menisci from 3D sagittal double-echo steady-state scans from the Osteoarthritis Initiative. The dataset consisted of 88 subjects scanned at two timepoints, split into cohorts of 60 for training, with baseline Kellgren-Lawrence grades (KLG) 1/2/3/4 distribution of (1,22,36,1), 14 for validation (1,4,8,1), and 14 for testing (0,5,8,1). Challenge participants were blinded to the all subject-identifying information. Approaches among all teams varied in CNN design and data augmentation methods, and are presented in a blinded manner below. Team 1 trained a multi-class 3D U-Net with dilated convolutions using a joint weighted cross-entropy and soft-Dice loss. Team 2 used a DeeplabV3+ architecture with dense convolutional blocks and a soft-Dice loss. Team 3 designed a multi-stage network built with a cascaded ensemble of 3D and 2D V-Nets, and used intensity and geometric transforms for data augmentation. Team 4 sampled 2D slices from multiple planes in the volume to train a 2D U-Net with batch normalization and nearest-neighbor upsampling. Team 5 used a generative adversarial framework to differentiate between real and generated 2D slices and 2D volumetric projections of segmentations that supervised the segmentation network. Following the challenge, a sixth submission (Team 6) utilized a simplified 2D, multi-class U-Net optimized with a soft-Dice loss. Dice overlap (Dice), volumetric overlap error (VOE), coefficient of variation (CV), and average symmetric surface distance (ASSD) assessed pixel-wise segmentation accuracy compared to expert-annotated ground truth. Cartilage thickness was computed for the automatic and manual approaches. Inter-network segmentation Dice overlaps were used to evaluate the similarity between different networks. Correlation between pixel-wise segmentation metrics (Dice, VOE, CV, and ASSD) and cartilage thickness error was measured using Pearson correlation coefficients (R). Statistical comparisons were performed using Kruskal-Wallis tests and Dunn post-hoc tests with Bonferroni correction (α=0.05). Results: All networks showed similar segmentation performance (violin plots Figure 1). No significant differences were observed in Dice, CV, VOE, ASSD for femoral cartilage (p=1.0), tibial cartilage (p=1.0), patellar cartilage (p=1.0), and menisci (p=1.0) among the four top-performing networks (Teams 1,3,4 and 6, respectively). Inter-network Dice overlaps were highest for femoral cartilage and above 0.85 for all tissues (Figure 2). There was no systematic bias or significant differences among a majority of the networks (p=0.99) for thickness estimates (Bland Altman plots in Figure 3). Correlation between pixel-wise segmentation accuracy metrics and cartilage thickness ranged from very-weak to moderate (highest R=0.41, thickness error vs segmentation metrics plot in Figure 4). Highest correlations were observed with femoral cartilage thickness (R less than 0.25), while very-weak correlation was observed with tibial cartilage (R less than 0.2). Conclusions: Despite the vast variety of network approaches, most methods achieved similar segmentation and thickness accuracy across all tissues, along with high inter-network Dice correlations. The similarity in performance and limitations may suggest that independent networks, regardless of their design and training framework, may learn to represent and segment the knee similarly. While networks performed comparably, there was variability in their thickness estimates. The correlation between standard segmentation metrics and cartilage thickness was weak, suggesting that traditional evaluation metrics on high-performing models may not be predictive of differences in thickness accuracy outcomes. Thus, through the segmentation challenge, we created a standardized and easy-to-use dataset to train and evaluate knee segmentation algorithms. Using deep-learning-based segmentation algorithms from multiple institutions, we showed that networks with varying training paradigms achieve similar performance and that amongst models achieving high segmentation performance, current segmentation accuracy metrics are weakly correlated with cartilage thickness endpoints.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Osteoarthritis and Cartilage	Publication Date: Apr 1, 2020
Citations: 3	License type: elsevier-specific: oa user license

R Discovery Prime

R Discovery Prime

A multi-institute automated segmentation evaluation on a standard dataset: Findings from the international workshop on osteoarthritis imaging segmentation challenge

Abstract

Talk to us

Similar Papers

More From: Osteoarthritis and Cartilage

Lead the way for us

Similar Papers

The International Workshop on Osteoarthritis Imaging Knee MRI Segmentation Challenge: A Multi-Institute Evaluation and Analysis Framework on a Standardized Dataset.
...
Radiology: Artificial Intelligence | VOL. 3
, et. al. ...
10 Feb 2021
Radiology: Artificial Intelligence | VOL. 3

Cross-sectional and longitudinal associations between systemic, subchondral bone mineral density and knee cartilage thickness in older adults with or without radiographic osteoarthritis
Yuelong Cao ... Flavia Cicuttini
Annals of the Rheumatic Diseases | VOL. 73
Yuelong Cao, et. al.Yuelong Cao ... Flavia Cicuttini
31 Jul 2013
Annals of the Rheumatic Diseases | VOL. 73

Longitudinal Changes in Ultrasound-Assessed Femoral Cartilage Thickness in Individuals from 4 to 6 Months Following Anterior Cruciate Ligament Reconstruction.
Caroline Lisee ... Katharine D Currie
CARTILAGE | VOL. 13
Caroline Lisee, et. al.Caroline Lisee ... Katharine D Currie
12 Aug 2021
CARTILAGE | VOL. 13

Common patterns of variation between femoral and tibial cartilage maps and baseline features from the osteoarthritis initiative
T Keefe ... A.E Nelson
Osteoarthritis and Cartilage | VOL. 29
T Keefe, et. al.T Keefe ... A.E Nelson
01 Apr 2021
Osteoarthritis and Cartilage | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

A multi-institute automated segmentation evaluation on a standard dataset: Findings from the international workshop on osteoarthritis imaging segmentation challenge

Abstract

Talk to us

Similar Papers

More From: Osteoarthritis and Cartilage