AbstractAimSpatial sampling bias (SSB) is a feature of opportunistically sampled species records. Species distribution models (SDMs) built using these data (i.e. presence‐background models) can produce biased predictions of suitability across geographic space, confounding species occurrence with the distribution of sampling effort. A wide range of SSB correction methods have been developed but simulations suggest effects on predictive performance are highly variable. Here, we aim to identify the SSB correction methods that have the highest likelihood of improving true predictive performance and evaluation strategies that provide a reliable indicator of model performance when independent test data are unavailable.LocationGlobal, simulation.Time PeriodCurrent, simulation.MethodsA meta‐analysis was used to evaluate the performance of SSB correction methods in studies where there were direct comparisons between corrected and uncorrected SDMs. A simulation model was then developed to test evaluation strategies against a known truth using four common SSB correction methods.ResultsEffect sizes from published studies suggest some support for small positive effects of SSB correction on predictive performance when assessed using independent test data, but this was not evident using internal cross‐validation and no single method stood out as consistently effective. Simulations support these findings and show that evaluation using internal test data was generally a poor indicator of the true effect of SSB correction. Methods that adjust models relative to a known driver of SSB produced the largest performance gains, but were also the most inconsistent.Main ConclusionsCorrecting SSB in presence‐background SDMs without independent test data to evaluate the effect on model performance requires careful implementation. We recommend clearer documentation of SSB correction effects on SDMs, presenting results from models with and without correction, evaluating effects of different assumptions of SSB implementation on predictions, as well as greater efforts to collect independent test datasets to validate model predictions.