The Neyman-Pearson strategy for hypothesis testing can be employed for goodness of fit if the alternative hypothesis is selected from data by exploring a rich parametrised family of models, while controlling the impact of statistical fluctuations. The New Physics Learning Machine (NPLM) methodology has been developed as a concrete implementation of this idea, to target the detection of new physical effects in the context of high energy physics collider experiments. In this paper we conduct a comparison of this approach to goodness of fit with others, in particular with classifier-based strategies that share strong similarities with NPLM. From our comparison, NPLM emerges as the more sensitive test to small departures of the data from the expected distribution and not biased towards detecting specific types of anomalies. These features make it suited for agnostic searches for new physics at collider experiments. Its deployment in other scientific and industrial scenarios should be investigated.