Conventional spirometry produces measurement error by using repeatability criteria (RC) to discard acceptable data and terminating tests early when RC are met. These practices also implicitly assume that there is no variation across maneuvers within each test. This has implications for air pollution regulations that rely on pulmonary function tests to determine adverse effects or set standards. We perform a Monte Carlo simulation of 20,902 tests of forced expiratory volume in 1 second (FEV1), each with eight maneuvers, for an individual with empirically obtained, plausibly normal pulmonary function. Default coefficients of variation for inter‐ and intratest variability (3% and 6%, respectively) are employed. Measurement error is defined as the difference between results from the conventional protocol and an unconstrained, eight‐maneuver alternative. In the default model, average measurement error is shown to be ∼5%. The minimum difference necessary for statistical significance at p < 0.05 for a before/after comparison is shown to be 16%. Meanwhile, the U.S. Environmental Protection Agency has deemed single‐digit percentage decrements in FEV1 sufficient to justify more stringent national ambient air quality standards. Sensitivity analysis reveals that results are insensitive to intertest variability but highly sensitive to intratest variability. Halving the latter to 3% reduces measurement error by 55%. Increasing it to 9% or 12% increases measurement error by 65% or 125%, respectively. Within‐day FEV1 differences ≤5% among normal subjects are believed to be clinically insignificant. Therefore, many differences reported as statistically significant are likely to be artifactual. Reliable data are needed to estimate intratest variability for the general population, subpopulations of interest, and research samples. Sensitive subpopulations (e.g., chronic obstructive pulmonary disease or COPD patients, asthmatics, children) are likely to have higher intratest variability, making it more difficult to derive valid statistical inferences about differences observed after treatment or exposure.