An inadvertent consequence of pesticide use is aquatic pesticide pollution, which has prompted the implementation of mitigation measures in many countries. Water quality monitoring programs are an important tool to evaluate the efficacy of these mitigation measures. However, large interannual variability of pesticide losses makes it challenging to detect significant improvements in water quality and to attribute these improvements to the application of specific mitigation measures. Thus, there is a gap in the literature that informs researchers and authorities regarding the number of years of aquatic pesticide monitoring or the effect size (e.g., loss reduction) that is required to detect significant trends in water quality. Our research addresses this issue by combining two exceptional empirical data sets with modelling to explore the relationships between the achieved pesticide reduction levels due to mitigation measures and the length of the observation period for establishing statistically significant trends. Our study includes both a large (Rhine at Basel, ∼36,300 km2) and small catchment (Eschibach, 1.2 km2), which represent spatial scales at either end of the spectrum that would be realistic for monitoring programs designed to assess water quality. Our results highlight several requirements in a monitoring program to allow for trend detection. Firstly, sufficient baseline monitoring is required before implementing mitigation measures. Secondly, the availability of pesticide use data helps account for the interannual variability and temporal trends, but such data are usually lacking. Finally, the timing and magnitude of hydrological events relative to pesticide application can obscure the observable effects of mitigation measures (especially in small catchments). Our results indicate that a strong reduction (i.e., 70–90 %) is needed to detect a change within 10 years of monitoring data. The trade-off in applying a more sensitive method for change detection is that it may be more prone to false-positives. Our results suggest that it is important to consider the trade-off between the sensitivity of trend detection and the risk of false positives when selecting an appropriate method and that applying more than one method can provide more confidence in trend detection.