The authors add to published empirical findings that an appropriate range of validity evidence for achievement tests, including science and mathematics, has either not been gathered, not reported, or is not accessible for independent review. The current study focuses on a sample of published, peer reviewed science intervention studies from a single journal using science achievement measures, over an 11-years period, with some discussion of validity, and finds results similar to previous studies that were conducted in other STEM areas and broader contexts. The consensus of the educational measurement profession is that validity is assessed by the extent to which evidence and theory support the proposed interpretation of test scores for proposed uses. What is troublesome is that a shortfall in validity evidence raises concerns about faulty, or insubstantial, test score interpretation to inform student short-term and long-term education and career trajectories, and to inform curricula intervention improvements. Discussion cautions readers that though the study findings report shortfalls in some kinds of validity evidence, this simply raises a flag for the test user to consider what kinds of validity evidence apply to her test use. It also raises a flag for the profession to explore the reasons for shortfalls.