Abstract

We investigate machine learning (ML) techniques for predicting the number of galaxies (N_gal) that occupy a halo, given the halo's properties. These types of mappings are crucial for constructing the mock galaxy catalogs necessary for analyses of large-scale structure. The ML techniques proposed here distinguish themselves from traditional halo occupation distribution (HOD) modeling as they do not assume a prescribed relationship between halo properties and N_gal. In addition, our ML approaches are only dependent on parent halo properties (like HOD methods), which are advantageous over subhalo-based approaches as identifying subhalos correctly is difficult. We test 2 algorithms: support vector machines (SVM) and k-nearest-neighbour (kNN) regression. We take galaxies and halos from the Millennium simulation and predict N_gal by training our algorithms on the following 6 halo properties: number of particles, M_200, \sigma_v, v_max, half-mass radius and spin. For Millennium, our predicted N_gal values have a mean-squared-error (MSE) of ~0.16 for both SVM and kNN. Our predictions match the overall distribution of halos reasonably well and the galaxy correlation function at large scales to ~5-10%. In addition, we demonstrate a feature selection algorithm to isolate the halo parameters that are most predictive, a useful technique for understanding the mapping between halo properties and N_gal. Lastly, we investigate these ML-based approaches in making mock catalogs for different galaxy subpopulations (e.g. blue, red, high M_star, low M_star). Given its non-parametric nature as well as its powerful predictive and feature selection capabilities, machine learning offers an interesting alternative for creating mock catalogs.

Highlights

  • As we enter the era of large-scale structure experiments such as LSST, WFIRST, and Euclid, the creation of reliable mock galaxy catalogs will become increasingly more important

  • One of the most popular methods for doing this is using halo occupation distributions (HODs) which is an analytic model for determining the number of galaxies (Ngal) that should form in a halo given its properties (e.g., Zheng et al 2009)

  • The machine learning (ML) methods we propose here, can only build mock catalogs based on the output from cosmological simulations of the universe with both N-body and hydrodynamics

Read more

Summary

Introduction

As we enter the era of large-scale structure experiments such as LSST, WFIRST, and Euclid, the creation of reliable mock galaxy catalogs will become increasingly more important Such catalogs are essential for correctly characterizing the expected errors in the analyses of these data sets, calibrating analysis pipelines, and measuring cosmological parameters (such as the dark energy equation of state) from galaxy clustering (e.g., Anderson et al 2012). Making mock catalogs for different subpopulations of galaxies (e.g., blue versus red, high Mstar versus low Mstar, etc.) to study their clustering properties is of utmost importance for understanding galaxy formation and evolution (e.g., Coil et al 2008; Guo et al 2012) These mock catalogs can be generated relatively quickly using perturbation-theory-based approaches such as that described in Manera et al (2013), it is well known that these approximations break down at small scales (e.g., Carlson et al 2009).

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call