More and more scholars are expressing doubt about whether questionnaire-based and other human-rater-based forms of behavior measurement are trustworthy, even though many of these measures meet psychometric best practice standards. I identify a lack of behavioral counterfactuals as common yet avoidable underlying problem and the existence of behavioral counterfactuals as an overlooked validity criterion. When behavioral counterfactuals exist, variation in item responses indicates variation in the presence, magnitude, or temporal unfolding of behaviors. By contrast, responses to non-counterfactual items and measures represent an indefinite mix of behavioral variation and variation in raters’ evaluation of the social significance of behaviors. I offer a typology of behaviorally non-counterfactual item formulations and conduct a large-scale review that identifies non-counterfactual item formulations as a severe and widespread problem that has intensified in recent decades. Such non-counterfactual measurement undermines the correctness of research findings and the clarity of action recommendations for managers. Using the stylized example of helpful and harmful leadership, I illustrate how non-counterfactual measures can gain erroneous empirical support and provide flawed as well as opaque “information” about effective leadership behavior. To reinvigorate research, I provide recommendations for ensuring behavioral counterfactuals, for example, through better questionnaires and coding schemes, experimentation, and technology-based measurement.