This paper describes four procedures previously developed for estimating conditional standard errors of measurement for scale scores: the IRT procedure (Kolen, Zeng, & Hanson. 1996), the binomial procedure (Brennan & Lee, 1999), the compound binomial procedure (Brennan & Lee, 1999), and the Feldt‐Qualls procedure (1998). These four procedures are based on different underlying assumptions. The IRT procedure is based on the unidimensional IRT model assumptions. The binomial and compound binomial procedures employ, as the distribution of errors, the binomial model and compound binomial model, respectively. By contrast, the Feldt‐Qualls procedure does not depend on a particular psychometric model, and it simply translates any estimated conditional raw‐score SEM to a conditional scale‐score SEM. These procedures are compared in a simulation study, which involves two‐dimensional data sets. The presence of two category dimensions reflects a violation of the IRT unidimensionality assumption. The relative accuracy of these procedures for estimating conditional scale‐score standard errors of measurement is evaluated under various circumstances. The effects of three different types of transformations of raw scores are investigated including developmental standard scores, grade equivalents, and percentile ranks. All the procedures discussed appear viable. A general recommendation is made that test users select a procedure based on various factors such as the type of scale score of concern, characteristics of the test, assumptions involved in the estimation procedure, and feasibility and practicability of the estimation procedure.