This article presents a validation study which investigates the relationship between mastery level descriptors and item difficulty in the National Tests of English (NTE) in Norway. The aim is to establish the extent to which the descriptors indicate item difficulty and thus support the argument that the mastery levels are a reflection of the framework of the NTE. This argument strength has direct implications for validity and for the defensibility of moving from criterion-based descriptors to norm-based national data. The study involved a panel of 10 raters assigning level descriptors from 7 content categories to 80 test items, giving a total of 5,600 individual judgements. These judgements are compared to the real test scores of around 46,000 pupils to establish if the level descriptors assigned by raters can predict pupil performance on the tests. The results show a strong correlation between rater judgements and real test scores, meaning that the descriptors offer an indication of item difficulty. However, some individual descriptor categories contain deviations from the expected order. We conclude that, while the level descriptors reflect the test’s framework, and thus support the validity argument, the argument could be strengthened with the revision of some individual descriptors.