The importance of the interaction between groundwater and surface water is increasingly being recognized for both understanding and managing water systems. Many efforts have been made to characterize and quantify groundwater–surface water exchange. In particular, temperature–based methods have quickly established themselves given their monetary and practical advantages. In the last 15 years, several methods for interpreting passive temperature time series measured in the streambed have been developed. Still, the benchmarking of these methods has only been carried out in specific and distinct hydrological conditions.This article aims to fill this research gap by benchmarking the performance of six commonly used methods for deriving seepage fluxes using a two-year-long temperature time series covering various meteorological and thermal conditions. This work compares three analytical methods that calculate seepage flux using the amplitude and/or phase of the temperature signals, and three numerical methods that use different schemes to inversely solve the one–dimensional heat transport equation in the streambed. The temperature measurements were made in the context of an international dispute between Chile and Bolivia over the status and use of the waters of the Silala River, Northern Chile. Flux estimations are tested against Darcy’s flux derived from measured hydraulic gradient and hydraulic conductivity.Flux estimations from the benchmarked methods ranged from −0.5 to 3.5 m/d (with positive fluxes directed downwards), whereas fluxes estimated using Darcy’s law ranged from 0.5 to 6 m/d. Results show that the amplitude method is the best–performing method. This method is best suited for estimating the direction of the fluxes, while the method using both the thermal amplitude and phase is best suited for monthly flux trends, and the combination of a Local Polynomial (LP) method and a Maximum Likelihood Estimator (MLE) method (LPMLEn) is appropriate for estimating flux in transient conditions.The use of heat as a tracer proved to be an effective tool for monitoring groundwater–surface water exchange in a river reach for two years, and yielded exchange flux estimates with lower point-scale variability than Darcy’s law.