Abstract

Galaxy morphology characterisation is an important area of study, as the type and formation of galaxies offer insights into the origin and evolution of the universe. Owing to the increased availability of images of galaxies, scientists have turned to crowd-sourcing to automate the process of instance labelling. However, research has shown that using crowd-sourced labels for galaxy classification comes with many pitfalls. An alternative approach to galaxy classification is metric learning. Metric learning allows for improved representations for classification, anomaly detection, information retrieval, clustering and dimensionality reduction. Understanding the implications of this approach regarding crowd-sourced labels is of paramount importance if scientists intend to continue using them. This paper compares metric learning and classification models trained or fine-tuned on both the crowd-sourced Galaxy Zoo 2 (GZ2-H) dataset and expertly labelled EFIGI catalogue. The study uses the Revised Shapley-Ames (RSA) catalogue of bright galaxies, also labelled by experts, as an unseen test set. The RSA catalogue allows for an accurate comparison of the performance of the models at predicting the Hubble types of galaxies. The classification accuracy for the crowd-sourced and expert models indicated that the models are comparable on the surface. However, using alternative metrics, the results show that the models trained on the expert dataset outperformed the model trained on the crowd-sourced data in terms of actual vs predicted labels. Further, the results show that fine-tuning a model pre-trained on crowd-sourced data can outperform the state-of-the-art in galaxy characterisation. The models trained to predict the Hubble types of galaxies are better when fine-tuned using the Proxy-NCA and Normalised-Softmax loss functions than with other pairwise losses. The Normalised-Softmax loss yielded the best overall 9-class models with accuracies at 30.88% (GZ2-H) and 30.05% (EFIGI) and MAP values of 0.3483 (GZ2-H) and 0.3889. The Proxy-NCA loss produced the second-best overall 9-class models with accuracies at 30.33% (GZ2-H) and 20.03% (EFIGI) and MAP values of 0.3577 (GZ2-H) and 0.3917 (EFIGI). Finally, the paper highlights the need for caution when utilising crowd-sourced labels; however, it argues that transfer learning from crowd-sourced labelled data to expert-labelled data can still lead to significant improvements.

Highlights

  • I N trying to understand the origin and evolution of the universe, an important area of focus is galaxy characterisation

  • The same model trained on the EFIGI catalogue achieved an accuracy of 29.52% and an unweighted F1 score of 0.2822 when tested on the Revised Shapley-Ames (RSA) catalogue using 9 Hubble classes

  • The results show that the best accuracy is obtained when fine-tuning on Galaxy Zoo 2 Hubble (GZ2-H) with nine classes

Read more

Summary

Introduction

I N trying to understand the origin and evolution of the universe, an important area of focus is galaxy characterisation. The type and formation of galaxies offer clues and insights into the development of the universe [1]. Galaxy morphological characterisation separates galaxies into classes based on their physical structures. A galaxies’ morphological characterisation typically falls within three major categories; namely elliptical, spiral and irregular. An elliptical galaxy has an ellipse-shaped light profile. Spiral galaxies are diskshaped and have multiple curved arms originating from the centre. Irregular galaxies do not fit into the elliptical and spiral categories. A widely accepted system for this characterisation of galaxies is the Hubble tuning fork [2]

Objectives
Methods
Results
Discussion
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.