Abstract

Abstract Introduction: Ki67 is an important biomarker for breast cancer but lack of scoring standardization has limited its clinical use. A previous International Ki67 in Breast Cancer Working Group reproducibility study (phase 1) found problematic inter-laboratory variability when labs employed their own scoring methods. This follow-up study (phase 2) devised and tested strategies to harmonize Ki67 scoring. Methods: Web-based Calibration: 17 labs were sent simple instructions prescribing a scoring pattern with supporting sample images and asked to score web images of 9 “training” and 9 “test” breast cancer cases representing a range of Ki67 values. Cases were selected from centrally MIB-1 stained TMA cores used in phase 1. Labs yielding consistent scores served as reference labs. Software tracked object selection and scoring. After scoring the training cases, labs were asked to learn from discrepancies by comparing their scored images with scored reference images. “Passing” the training was required for proceeding to testing. The study allowed multiple attempts in training but only 1 attempt on the test cases. Statistical criteria for success, reflecting deviation from reference scores (RMSE < 0.6, MAXDEV < 1.0), were pre-specified. Phase 2 on glass: Scoring instructions similar to those used in the calibration were provided to 16 of the calibration study labs. They were asked to apply the same standardized method to glass TMA slides (50 cases from the phase 1 TMA, none used in the calibration). Three sections from the TMA were circulated among 3 groups. Labs’ Ki67 scores were log2-transformed to approximate a normal distribution. Sources of variation (e.g., patient, lab) were analyzed using 2-way crossed random effects models with quantification of reproducibility via intraclass correlation coefficient (ICC; range of 0-1, 1 = highest agreement). The pre-specified benchmark of success was a scoring ICC consistent with a true value of 0.9 and significantly greater than the observed overall ICC from phase 1 (0.7). Results: Web-based Calibration: Lab performance through the calibration exercise, from training to testing, showed trends of improvement (average across labs: RMSE decreased from 0.6 to 0.4, MAXDEV decreased from 1.6 to 0.9), although not statistically significant, possibly due to limited number of labs (paired t-test: p = 0.07 for RMSE, 0.06 for MAXDEV). Phase 2 on glass: Whereas inter-laboratory reproducibility in phase 1 was only moderate overall, standardizing the scoring method resulted in achieving a scoring ICC of 0.92 (0.79-0.95), 0.96 (0.78-0.97), and 0.94 (0.77-0.97) in phase 2 for the three groups receiving a given section of the TMA, respectively. Substantial discrepancies persisted among labs on some cases, however, including in the range of clinically relevant cutoffs. Conclusions: Previous evidence showed that absolute values and cutoffs for Ki67 cannot be transferred between labs without careful standardization of scoring methodology. Use of a common scoring method after training with a web-based calibration tool achieves high inter-laboratory reproducibility in Ki67 scoring on centrally-stained glass TMA slides. Future research should extend this approach to unstained biopsies and whole sections, and link to outcomes, particularly for cases around cutoffs. Citation Information: Cancer Res 2013;73(24 Suppl): Abstract nr S2-07.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.