Formative colonoscopy direct observation of procedural skills (DOPS) assessments were updated in 2016 and incorporated into UK training but lack validity evidence. We aimed to appraise the validity of DOPS assessments, benchmark performance, and evaluate competency development during training in diagnostic colonoscopy. This prospective national study identified colonoscopy DOPS submitted over an 18-month period to the UK training e-portfolio. Generalizability analyses were conducted to evaluate internal structure validity and reliability. Benchmarking was performed using receiver operator characteristic analyses. Learning curves for DOPS items and domains were studied, and multivariable analyses were performed to identify predictors of DOPS competency. Across 279 training units, 10,749 DOPS submitted for 1,199 trainees were analyzed. The acceptable reliability threshold (G > 0.70) was achieved with 3 assessors performing 2 DOPS each. DOPS competency rates correlated with the unassisted caecal intubation rate (rho 0.404, P < 0.001). Demonstrating competency in 90% of assessed items provided optimal sensitivity (90.2%) and specificity (87.2%) for benchmarking overall DOPS competence. This threshold was attained in the following order: "preprocedure" (50-99 procedures), "endoscopic nontechnical skills" and "postprocedure" (150-199), "management" (200-249), and "procedure" (250-299) domain. At item level, competency in "proactive problem solving" (rho 0.787) and "loop management" (rho 0.780) correlated strongest with the overall DOPS rating (P < 0.001) and was the last to develop. Lifetime procedure count, DOPS count, trainer specialty, easier case difficulty, and higher cecal intubation rate were significant multivariable predictors of DOPS competence. This study establishes milestones for competency acquisition during colonoscopy training and provides novel validity and reliability evidence to support colonoscopy DOPS as a competency assessment tool.