ABSTRACT Modern cosmological hydrodynamical galaxy simulations provide tens of thousands of reasonably realistic synthetic galaxies across cosmic time. However, quantitatively assessing the level of realism of simulated universes in comparison to the real one is difficult. In this paper of the Extracting Reality from Galaxy Observables with Machine Learning series, we utilize contrastive learning to directly compare a large sample of simulated and observed galaxies based on their stellar-light images. This eliminates the need to specify summary statistics and allows to exploit the whole information content of the observations. We produce survey-realistic galaxy mock data sets resembling real Hyper Suprime-Cam (HSC) observations using the cosmological simulations TNG50 and TNG100. Our focus is on galaxies with stellar masses between 109 and 1012 M⊙ at z = 0.1–0.4. This allows us to evaluate the realism of the simulated TNG galaxies in comparison to actual HSC observations. We apply the self-supervised contrastive learning method Nearest Neighbour Contrastive Learning to the images from both simulated and observed data sets (g-, r-, i-bands). This results in a 256-dimensional representation space, encoding all relevant observable galaxy properties. First, this allows us to identify simulated galaxies that closely resemble real ones by seeking similar images in this multidimensional space. Even more powerful, we quantify the alignment between the representations of these two image sets, finding that the majority (≳ 70 per cent) of the TNG galaxies align well with observed HSC images. However, a subset of simulated galaxies with larger sizes, steeper Sérsic profiles, smaller Sérsic ellipticities, and larger asymmetries appears unrealistic. We also demonstrate the utility of our derived image representations by inferring properties of real HSC galaxies using simulated TNG galaxies as the ground truth.
Read full abstract