Molecular dynamics (MD) simulations are widely applied to estimate absolute binding free energies of protein-ligand and protein-protein complexes. A routinely used method for binding free energy calculations with MD is umbrella sampling (US), which calculates the potential of mean force (PMF) along a single reaction coordinate. Surprisingly, in spite of its widespread use, few validation studies have focused on the convergence of the free energy computed along a single path for specific cases, not addressing the reproducibility of such calculations in general. In this work, we therefore investigate the reproducibility and convergence of US along a standard distance-based reaction coordinate for various protein-protein and protein-ligand complexes, following commonly used guidelines for the setup. We show that repeating the complete US workflow can lead to differences of 2-20 kcal/mol in computed binding free energies. We attribute those discrepancies to small differences in the binding pathways. While these differences are unavoidable in the established US protocol, the popularity of the latter could hint at a lack of awareness of such reproducibility problems. To test if the convergence of PMF profiles can be improved if multiple pathways are sampled simultaneously, we performed additional simulations with an adaptive-biasing method, here the accelerated weight histogram (AWH) approach. Indeed, the PMFs obtained from AHW simulations are consistent and reproducible for the systems tested. To the best of our knowledge, our work is the first to attempt a systematic assessment of the pitfalls in one the most widely used protocols for computing binding affinities. We anticipate therefore that our results will provide an incentive for a critical reassessment of the validity of PMFs computed with US, and make a strong case to further benchmark the performance of adaptive-biasing methods for computing binding affinities.