Measurement of repeatability and reproducibility (R&R) is necessary to realize the full potential of positron emission tomography (PET). Several studies have evaluated the reproducibility of PET using 18F-FDG, the most common PET tracer used in oncology, but similar studies using other PET tracers are scarce. Even fewer assess agreement and R&R with statistical methods designed explicitly for the task. 18F-(2S, 4R)-4-fluoro-glutamine (18F-Gln) is a PET tracer designed for imaging glutamine uptake and metabolism. This study illustrates high reproducibility and repeatability with 18F-Gln for in vivo research. Twenty mice bearing colorectal cancer cell line xenografts were injected with ~9 MBq of 18F-Gln and imaged in an Inveon microPET. Three individuals analyzed the tumor uptake of 18F-Gln using the same set of images, the same image analysis software, and the same analysis method. Scans were randomly re-ordered for a second repeatability measurement 6 months later. Statistical analyses were performed using the methods of Bland and Altman (B&A), Gauge Reproducibility and Repeatability (Gauge R&R), and Lin's Concordance Correlation Coefficient. A comprehensive equivalency test, designed to reject a null hypothesis of non-equivalence, was also conducted. In a two-way random effects Gauge R&R model, variance among mice and their measurement variance were 0.5717 and 0.024. Reproducibility and repeatability accounted for 31% and 69% of the total measurement error, respectively. B&A repeatability coefficients for analysts 1, 2, and 3 were 0.16, 0.35, and 0.49. One-half B&A agreement limits between analysts 1 and 2, 1 and 3, and 2 and 3 were 0.27, 0.47, and 0.47, respectively. The mean square deviation and total deviation index were lowest for analysts 1 and 2, while coverage probabilities and coefficients of the individual agreement were highest. Finally, the definitive agreement inference hypothesis test for equivalency demonstrated that all three confidence intervals for the average difference of means from repeated measures lie within our a priori limits of equivalence (i.e. ± 0.5%ID/g). Our data indicate high individual analyst and laboratory-level reproducibility and repeatability. The assessment of R&R using the appropriate methods is critical and should be adopted by the broader imaging community.
Read full abstract