Introduction: The Ottawa Emergency Department Shift Observation Tool (O-EDShOT) was recently developed to assess a resident's ability to safely run an ED shift and is supported by multiple sources of validity evidence. The O-EDShOT uses entrustability scales, which reflect the degree of supervision required for a given task. It was found to discriminate between learners of different levels, and to differentiate between residents who were rated as able to safely run the shift and those who were not. In June 2018 we replaced norm-based daily encounter cards (DECs) with the O-EDShOT. With the ideal assessment tool, most of the score variability would be explained by variability in learners’ performances. In reality, however, much of the observed variability is explained by other factors. The purpose of this study is to determine what proportion of total score variability is accounted for by learner variability when using norm-based DECs vs the O-EDShOT. Methods: This was a prospective pre-/post-implementation study, including all daily assessments completed between July 2017 and June 2019 at The Ottawa Hospital ED. A generalizability analysis (G study) was performed to determine what proportion of total score variability is accounted for by the various factors in this study (learner, rater, form, pgy level) for both the pre- and post- implementation phases. We collected 12 months of data for each phase, because we estimated that 6-12 months would be required to observe a measurable increase in entrustment scale scores within a learner. Results: A total of 3908 and 3679 assessments were completed by 99 and 116 assessors in the pre- and post- implementation phases respectively. Our G study revealed that 21% of total score variance was explained by a combination of post-graduate year (PGY) level and the individual learner in the pre-implementation phase, compared to 59% in the post-implementation phase. An average of 51 vs 27 forms/learner are required to achieve a reliability of 0.80 in the pre- and post-implementation phases respectively. Conclusion: A significantly greater proportion of total score variability is explained by variability in learners’ performances with the O-EDShOT compared to norm-based DECs. The O-EDShOT also requires fewer assessments to generate a reliable estimate of the learner's ability. This study suggests that the O-EDShOT is a more useful assessment tool than norm-based DECs, and could be adopted in other emergency medicine training programs.