Abstract
Deep-learning models have been increasingly exploited in astrophysical studies, but these data-driven algorithms are prone to producing biased outputs that are detrimental for subsequent analyses. In this work, we investigate two main forms of biases: class-dependent residuals, and mode collapse. We do this in a case study, in which we estimate photometric redshift as a classification problem using convolutional neural networks (CNNs) trained with galaxy images and associated spectroscopic redshifts. We focus on point estimates and propose a set of consecutive steps for resolving the two biases based on CNN models, involving representation learning with multichannel outputs, balancing the training data, and leveraging soft labels. The residuals can be viewed as a function of spectroscopic redshift or photometric redshift, and the biases with respect to these two definitions are incompatible and should be treated individually. We suggest that a prerequisite for resolving biases in photometric space is resolving biases in spectroscopic space. Experiments show that our methods can better control biases than benchmark methods, and they are robust in various implementing and training conditions with high-quality data. Our methods hold promises for future cosmological surveys that require a good constraint of biases, and they may be applied to regression problems and other studies that make use of data-driven models. Nonetheless, the bias-variance tradeoff and the requirement of sufficient statistics suggest that we need better methods and optimized data usage strategies.
Highlights
Estimating galaxy redshifts is crucial for studies of galaxy evolution and cosmology
Discussion of the bias behaviors Following our discussions of correcting for zspec-dependent biases, we investigated the behaviors of biases and the performance of our methods by controlling the convolutional neural networks (CNNs) models with varying implementing and training conditions
We analyzed two biases that are generally present in data-driven methods, namely class-dependent residuals and mode collapse, which are two effects imposed by the prior of training data and the model implementation
Summary
Estimating galaxy redshifts is crucial for studies of galaxy evolution and cosmology. While redshifts obtained by spectroscopic measurements (spec-z) typically have high accuracy, they are highly time intensive and not ideal for the extremely large data sizes from ongoing or future imaging surveys There are two broad categories of methods for estimating photometric redshifts for individual galaxies: template-fitting methods, and data-driven methods (see Salvato et al 2019 for a review). Template-fitting methods model the galaxy spectral energy distribution (SED) and infer redshifts by fitting the galaxy photometry based on the SED templates (e.g., Arnouts et al 1999; Feldmann et al 2006; Ilbert et al 2006; Greisel et al 2015; Leistedt et al 2019)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.