International large-scale assessments such as TIMSS administer achievement tests that are based on an analysis of national curricula to compare student achievement across countries. The organizations that coordinate these studies use Rasch or more generalized item response theory (IRT) models in which all test items are assumed to measure a single latent ability. The test responses are then used to estimate this ability, and the ability scores are used to compare countries.A central but yet-to-be-contested assumption of this approach is that the achievement tests measure an unobserved unidimensional continuous variable that is comparable across countries. One threat to this assumption is the fact that countries and even regions or school tracks within countries have different curricula. When seeking to fairly compare countries, it seems legitimate to account for the fact that applicable curricula differ and that some students may not have been taught the full test content yet. When seeking to fairly compare countries, it seems imperative to account for the fact that national curricula differ and that some countries may not have taught the full test content yet. Nevertheless, existing IRT-based rankings ignore such differences.The present study proposes a direct method to deal with differing curricula and create a fair ranking of educational quality between countries. The new method compares countries solely on test content that has already been taught; it uses information on whether students have mastered skills taught in class or not and does not consider contents that have not been taught yet. Mastery is assessed via the deterministic-input, noisy, “and” gate (DINA) model, an interpretable and tractable cognitive diagnostic model. To illustrate the new method, we use data from TIMSS 1995 and compare it to the IRT-based scores published in the international study report. We find a mismatch between the TIMSS test contents and national curricula in all countries. At the same time, we observe a high correlation between the scores based on the new method and the conventional IRT scores. This finding underscores the robustness of the performance measures reported in TIMSS despite existing differences across national curricula.