Understanding the morphology of galaxies is a critical aspect of astrophysics research, providing insight into the formation, evolution, and physical properties of these vast cosmic structures. Various observational and computational methods have been developed to quantify galaxy morphology, and with the advent of large galaxy simulations, the need for automated and effective classification methods has become increasingly important. This paper investigates the use of principal component analysis (PCA) as an interpretable dimensionality reduction algorithm for galaxy morphology using the IllustrisTNG cosmological simulation dataset with the aim of developing a generative model for galaxies. We first generate a dataset of 2D images and 3D cubes of galaxies from the IllustrisTNG simulation, focusing on the mass, metallicity, and stellar age distribution of each galaxy. PCA is then applied to this data, transforming it into a lower-dimensional image space, where closeness of data points corresponds to morphological similarity. We find that PCA can effectively capture the key morphological features of galaxies, with a significant proportion of the variance in the data being explained by a small number of components. With our method we achieve a dimensionality reduction by a factor of $ for 2D images and $ for 3D cubes at a reconstruction accuracy below five percent. Our results illustrate the potential of PCA in compressing large cosmological simulations into an interpretable generative model for galaxies that can easily be used in various downstreaming tasks such as galaxy classification and analysis.
Read full abstract