Abstract Aims: (i) To determine if between-pathologist agreement for Ki67 is adequate for clinical application, following a standardised scoring protocol. (ii) To compare between-pathologist agreement of scoring hot-spots vs a global method averaging Ki67 across each section. Background: The nuclear proliferation biomarker Ki67 has multiple potential roles in breast cancer, including aiding decisions based on prognosis, but has unacceptable between-laboratory variability. The International Ki67 Working Group has undertaken a systematic program to determine whether Ki67 measurement can be analytically validated and standardized across labs. In phase 1 variability in visual interpretation was the most important source of variability. Phase 2 showed that significant improvements in agreement could be achieved when scoring the same tumors on tissue microarrays by following clearly defined scoring methods. We now assess whether acceptable performance can be achieved on core-cut biopsies using a standardised method. Methods: Three adjacent sections from each of 30 primary ER+ breast cancers were centrally stained for Ki67 to assemble three sets of 30 stained tumor sections, circulated around 22 laboratories in 11 countries. Ki67 was scored by 2 methods by all labs: (a) global: 4 fields of 100 cells each were selected to represent any heterogeneity (b) hotspot: the field with highest Ki67 staining percentage was selected and 500 cells scored. Ki67 scores were log2-transformed for statistical analyses and back-transformed for presentation. The primary objective was to assess if either method could achieve an intraclass correlation coefficient (ICC) significantly greater than 0.8, considered substantial to almost-perfect agreement. Secondary objectives were to assess which method had highest observed ICC and to assess whether pathologists identified the same "hotspots". Results: The ICC for the global method was 0.88 (95%CI: 0.81-0.93) and therefore met the prespecified success criterion. The ICC for the hotspot method was 0.84 (95%CI: 0.77-0.92) and therefore had a CI which extended below the success criterion. Across the 22 labs, geometric mean value of the 30 scores ranged from 14.4 to 27.9 for the global method and from 17.4 to 40.2 for the hotspot method. The overall mean (95% CI) of these values was 19.8 (18.5-21.3) and 26.4 (24.6-28.3), respectively. Visually, there was moderately strong agreement in location of selected hotspot in the core-cuts across laboratories. The impact of variability of the Ki67 scores for estimating prognosis using the integrated IHC4 + clinical treatment score will be assessed. After selection of the areas to score, the median times for cell counting were 3 and 4 minutes for the global and hotspot methods, respectively. Conclusions: The global method met the prespecified criterion of success; it should now be evaluated for clinical validity in appropriate cohorts of samples. The hotspot method showed slightly less agreement between labs. The time taken for scoring is practical using counting software we are making publicly available. Establishment of external quality assessment schemes is likely to improve the agreement between labs further. (Supported by a grant from the Breast Cancer Research Foundation). Citation Format: Dowsett M, Leung SCY, Zabaglo L, Arun I, Badve SS, Bane AL, Bartlett JMS, Borgquist S, Chang MC, Dodson A, Enos RA, Fineberg S, Focke CM, Gao D, Gown AM, Grabau D, Gutierrez C, Hugh JC, Kos Z, Lænkholm A-V, Lin M-G, Mastropasqua MG, Moriya T, Nofech-Mozes S, Osborne CK, Penault-Llorca FM, Piper T, Sakatani T, Salgado R, Starczynski J, Viale G, Hayes DF, McShane LM, Nielsen TO. Analytical validation of a standardized scoring protocol for Ki67: Phase-3 of an international multicenter collaboration. [abstract]. In: Proceedings of the Thirty-Eighth Annual CTRC-AACR San Antonio Breast Cancer Symposium: 2015 Dec 8-12; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2016;76(4 Suppl):Abstract nr P1-01-01.