Challenges in measuring early childhood development (ECD) at scale have been documented, yet little is known about the specific difficulties related to questionnaire design and question interpretation. The purpose of this paper is to discuss the challenges of measuring ECD at scale in the context of household surveys and to show how to overcome them. The paper uses examples from the cognitive interviewing exercises that were conducted as part of the methodological work to develop a measure of ECD outcomes, the ECDI2030. It describes the methodological work carried out to inform the selection and improvement of question items and survey implementation tools as a fundamental step to reduce and mitigate systematic measurement error and improve data quality. The project consisted of a total of five rounds of testing, comprising 191 one-on-one, in-depth cognitive interviews across six countries (Bulgaria, India, Jamaica, Mexico, Uganda, and the USA). Qualitative data analysis methods were used to determine matches and mismatches between intention of items and false positives or false negative answers among subgroups of respondents. Key themes emerged that could potentially lead to systematic measurement error in population-based surveys on ECD: (1) willingness of child to perform task versus ability of child to perform task; (2) performing task versus performing task correctly; (3) identifying letters or numbers versus recognizing letters or numbers; (4) consistently performing task versus correctly performing task; (5) applicability of skills being asked versus observability of skills being asked; and (6) language production versus language comprehension. Through an iterative process of testing and subsequent revision, improvements were made to item wording, response options, and interviewer training instructions. Given the difficulties inherent in population-level data collection in the context of global monitoring, this study’s findings confirm the importance of cognitive testing as a crucial step in careful, culturally relevant, and sensitive questionnaire design and as a means to reduce response bias in cross-cultural contexts.