Abstract

ABSTRACT To extract information from the clustering of galaxies on non-linear scales, we need to model the connection between galaxies and haloes accurately and in a flexible manner. Standard halo occupation distribution (HOD) models make the assumption that the galaxy occupation in a halo is a function of only its mass, however, in reality; the occupation can depend on various other parameters including halo concentration, assembly history, environment, and spin. Using the IllustrisTNG hydrodynamical simulation as our target, we show that machine learning tools can be used to capture this high-dimensional dependence and provide more accurate galaxy occupation models. Specifically, we use a random forest regressor to identify which secondary halo parameters best model the galaxy–halo connection and symbolic regression to augment the standard HOD model with simple equations capturing the dependence on those parameters, namely the local environmental overdensity and shear, at the location of a halo. This not only provides insights into the galaxy formation relationship but also, more importantly, improves the clustering statistics of the modelled galaxies significantly. Our approach demonstrates that machine learning tools can help us better understand and model the galaxy–halo connection, and are therefore useful for galaxy formation and cosmology studies from upcoming galaxy surveys.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call