AbstractPurposeCognitive load and arousal are cornerstones of many deception detection strategies and theories; in turn, their effective measurement is critical. However, fundamental criteria for establishing the quality and accuracy of measures have largely been overlooked. In this study, we examined the reliability and construct validity of common cognitive load and arousal measures.MethodWe obtained three independent secondary datasets in which participants (N = 238) had lied or told the truth about witnessing a suspicious event. Using a multitrait–multimethod analysis, we assessed three measures of their cognitive load and arousal: participants' self‐reports, trained coders' observations, and objective behaviours.ResultsAlthough all measures were reliable, they achieved differing levels of validation. Specifically, measures of cognitive load showed evidence of convergent validity, but not discriminant validity. There was no empirical support for the construct validity of arousal measures.ConclusionsThese findings suggest that inconsistencies in the diagnosticity of cues to deception and theory support may be attributable to the measures employed. Researchers may not be assessing constructs of interest, particularly in the case of arousal.