Total metabolic tumor volume (TMTV) is prognostic in lymphoma. However, cutoff values for risk stratification vary markedly, according to the tumor delineation method used. We aimed to create a standardized TMTV benchmark datasetallowing TMTV to be tested and applied as a reproducible biomarker. Methods: Sixty baseline 18F-FDG PET/CT scans were identified with a range of disease distributions (20 follicular, 20 Hodgkin, and 20 diffuse large B-cell lymphoma). TMTV was measured by 12 nuclear medicine experts, each analyzing 20 cases split across subtypes, with each case processed by 3-4 readers. LIFEx or ACCURATE software was chosen according to reader preference. Analysis was performed stepwise: TMTV1 with automated preselection of lesions using an SUV of at least 4 and a volume of at least 3 cm3 with single-click removal of physiologic uptake; TMTV2 with additional removal of reactive bone marrow and spleen with single clicks; TMTV3 with manual editing to remove other physiologic uptake, if required; and TMTV4 with optional addition of lesions using mouse clicks with an SUV of at least 4 (no volume threshold). Results: The final TMTV (TMTV4) ranged from 8 to 2,288 cm3, showing excellent agreement among all readers in 87% of cases (52/60) with a difference of less than 10% or less than 10 cm3 In 70% of the cases, TMTV4 equaled TMTV1, requiring no additional reader interaction. Differences in the TMTV4 were exclusively related to reader interpretation of lesion inclusion or physiologic high-uptake region removal, not to the choice of software. For 5 cases, large TMTV differences (>25%) were due to disagreement about inclusion of diffuse splenic uptake. Conclusion: The proposed segmentation method enabled highly reproducible TMTV measurements, with minimal reader interaction in 70% of the patients. The inclusion or exclusion of diffuse splenic uptake requires definition of specific criteria according to lymphoma subtype. The publicly available proposed benchmark allows comparison of study results and could serve as a reference to test improvements using other segmentation approaches.
Read full abstract