PD-L1 expression in NSCLC correlates with increased response to pembrolizumab, supporting its role as a predictive biomarker but the reproducibility of pathologists’ scoring of PD-L1 requires further investigation. The primary objective of the DREAM study was to assess the reproducibility of scoring PD-L1 staining by evaluating intra- and inter-observer reproducibility for the assessment of PD-L1 expression in NSCLC. The secondary objective was to assess the impact of training on reproducibility. The study was a blinded, pathologist reproducibility study of scoring PD-L1 expression in NSCLC cases stained with PD-L1 22C3 pharmDx™ kit using the Dako Automated Link 48 Platform. Two pathologists previously trained and certified by Dako scored 789 specimens to form the gold standard. From these specimens 60 were randomly selected to evaluate a 1% cut-point and 60 for a 50% cut-point. Both sample sets were designed to include 50% positive/negative specimens and 20-30 close to each cut-point. Ten pathologists were randomly assigned to two subgroups. Subgroup 1 analyzed all samples on two consecutive days. Subgroup 2 performed the same assessments, except they received a one hour training session prior to the second assessment. The overall percent agreement (OPA) for the analysis of the intra-observer reproducibility was 89.7% (95% CI: 85.7; 92.6) for the 1% cut-point sample set and 91.3% (95% CI: 87.6; 94.0) for the 50% cut-point. The OPAs for inter-observer reproducibility of all ten pathologists were 84.2% (95% CI: 82.8; 85.5) and 81.9% (95% CI: 80.4; 83.3) for the 1% and 50% cut-point sample sets, respectively. There was substantial agreement at both the 1% cut-point (prevalence-adjusted bias-adjusted kappa 0.68 (95% CI: 0.65; 0.71)) and the 50% cut-point (prevalence-adjusted bias-adjusted kappa 0.64 (95% CI: 0.61; 0.67)). Training was found to have no or very little impact on the inter- or intra-observer reproducibility in subgroup 2. The OPAs for the inter-observer reproducibility assessments were 82.0% and 82.3% for the first and second assessments of the 1% cut-point sample set, respectively, and 78.3% and 81.7% for the first and second assessments of the 50% cut-point sample set, respectively. The exploratory analyses showed that the sensitivity and specificity of the pathologists assessments, compared with the gold standard assessment, were 84.3% and 91.3%, respectively, for the 1% cut-point and 56.3% and 94.0%, respectively, for the 50% cut-point. There is high intra-observer reproducibility and substantial inter-observer agreement in pathologists’ assessment of PD-L1 expression in NSCLC at 1% and 50% cut-points.
Read full abstract