The purpose of our study was to compare real-time, live observational scoring with delayed retrospective video review of operative performance and to determine whether the evaluation method affected the attainment of proficiency benchmarks. Sixteen arthroscopy/sports medicine fellows and 2 senior residents completed training to perform arthroscopic Bankart repairs (ABRs) and arthroscopic rotator cuff repairs (ARCRs) using a proficiency-based progression curriculum. Each final operative performance for 15 randomly selected ABRs and 13 ARCRs performed on cadavers were scored live (observation during the operative performance) and on delayed video review (6-8 weeks) by 1 of 15 trained raters using validated metric-based (step and error) assessment tools. The inter-rater reliability (IRR) of live versus video review by a single rater was calculated, and changes to the trainee's attainment of the proficiency benchmarks were noted. The correlation coefficient (r) and the R2 were also calculated for the paired scores from the randomly selected performances. No significant differences in the observed IRR agreement or the attainment of the proficiency benchmarks were found when comparing live to video assessment for either ABR or ARCR. The correlation coefficients r and R2 were considerably lower than the agreement coefficient (IRR) for rotator cuff steps (e.g., R2= 0.74 vs. IRR= 0.97, P=0.001); Bankart errors (R2= 0.73 vs. IRR= 0.98, P= 0.006); and rotator cuff errors (R2= 0.48 vs. IRR= 0.98, P=0.0002). Real-time live and delayed video-based scoring of operative performance are essentially equivalent for the metric-based assessments of operative performance in ABRs and ARCRs. When the IRR agreement coefficient was compared with the correlation coefficients, the former was found to have greater homogeneity and measurement precision. Metric-based live scoring is reliable and accurate for operative performance assessment, including high-stakes evaluations.