Abstract Physics-based earthquake simulators are an increasingly popular modeling tool in earthquake forecasting for seismic hazard as well as fault rupture behavior studies. Their popularity comes from their ability to overcome completeness limitations of real catalogs, and also because they allow reproducing complex fault rupture and interaction patterns via modeling the physical processes involved in earthquake nucleation and propagation. One important challenge of these models revolves around selecting the physical input parameters that will yield the better similarity to earthquake relationships observed in nature, for instance, the frictional parameters of the rate-and-state law—a and b—or the initial normal and shear stresses. Because of the scarcity of empirical data, such input parameters are often selected by trial–error exploration and predominantly manual model performance analyses, which can overall be time consuming. We present a new benchmarking approach to analyze and rank the relative performance of simultaneous earthquake simulation catalogs by quantitatively scoring their combined fit to three reference function types: (1) earthquake-scaling relationships, (2) the shape of the magnitude–frequency distributions, and (3) the rates of the surface ruptures from paleoseismology or paleoearthquake occurrences. The approach provides an effective and potentially more efficient approximation to easily identify the models and input parameter combinations that fit more closely to earthquake relations and behavior. The approach also facilitates the exhaustive analysis of many input parameter combinations, identifying systematic correlations between parameters and model outputs that can potentially improve the overall understanding of the physics-based models. Finally, we demonstrate how the method results agree with the published findings in other earthquake simulation evaluations, a fact that reinforces its overall usefulness. The model ranking outputs can be useful for subsequent analyses, particularly in seismic hazard applications, such as the selection of appropriate earthquake occurrence rate models and their weighting for a logic tree.