Abstract

In this paper, we apply five different machine learning algorithms to classify samples into four categories — spirals, ellipticals, mergers and stars (don’t know) using data from the Sloan Digital Sky Survey to assess the feasibility of using machine learning methods for future surveys. Classifying mergers as a separate class poses a challenge as this category is easily confused with both ellipticals and spirals, and as a result, most previous studies have not included mergers as a distinct morphological class. The dataset is highly imbalanced with the number of ellipticals/spirals being much larger than the number of stars/mergers, and this is another challenge we aim to address. Starting with 62 features, we perform principal component analysis and use the 25 most significant principal components as inputs to the machine learning models. We compare our results with the Galaxy Zoo labels and obtain an overall test accuracy of 98.2% and 97.5% using Artificial Neural Network and ExtraTrees respectively. However, ExtraTrees outperforms Neural Network in classifying mergers and stars. We also perform a parameter sensitivity test to compare the relative importance of different categories of features on the model’s performance. Finally, we address the class imbalance problem and examine the effects of different sampling strategies. Our results show that the use of a balanced dataset with a large number of training samples leads to high recall values for the minority classes, and that oversampling methods lead to better performance than undersampling techniques.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call