ABSTRACT This paper explores the application of machine learning methods for classifying astronomical sources using photometric data, including normal and emission line galaxies (ELGs; starforming, starburst, AGN, broad-line), quasars, and stars. We utilized samples from Sloan Digital Sky Survey (SDSS) Data Release 17 (DR17) and the ALLWISE catalogue, which contain spectroscopically labelled sources from SDSS. Our methodology comprises two parts. First, we conducted experiments, including three-class, four-class, and seven-class classifications, employing the Random Forest (RF) algorithm. This phase aimed to achieve optimal performance with balanced data sets. In the second part, we trained various machine learning methods, such as k-nearest neighbours (KNN), RF, XGBoost (XGB), voting, and artificial neural network (ANN), using all available data based on promising results from the first phase. Our results highlight the effectiveness of combining optical and infrared features, yielding the best performance across all classifiers. Specifically, in the three-class experiment, RF and XGB algorithms achieved identical average F1 scores of 98.93 per cent on both balanced and unbalanced data sets. In the seven-class experiment, our average F1 score was 73.57 per cent. Using the XGB method in the four-class experiment, we achieved F1 scores of 87.9 per cent for normal galaxies (NGs), 81.5 per cent for ELGs, 99.1 per cent for stars, and 98.5 per cent for quasars (QSOs). Unlike classical methods based on time-consuming spectroscopy, our experiments demonstrate the feasibility of using automated algorithms on carefully classified photometric data. With more data and ample training samples, detailed photometric classification becomes possible, aiding in the selection of follow-up observation candidates.
Read full abstract