Classifying and summarizing large data sets from different sky survey projects is essential for various subsequent scientific research. By combining data from 4XMM-DR13, Sloan Digital Sky Survey (SDSS) DR18, and CatWISE, we formed an XMM-WISE-SDSS sample that included information in the X-ray, optical, and infrared bands. By cross matching this sample with data sets from known spectral classifications from SDSS and LAMOST, we obtained a training data set containing stars, galaxies, quasars, and young stellar objects (YSOs). Two machine learning methods, CatBoost and Self-Paced Ensemble (SPE), were used to train and construct machine learning models through training sets to classify the XMM-WISE-SDSS sample. Notably, the SPE classifier showed excellent performance in YSO classification, identifying 1102 YSO candidates from 160,545 sources, including 258 known YSOs. Then we further verify whether these candidates are YSOs by the spectra in LAMOST and the identification in the SIMBAD and VizieR databases. Finally there are 412 unidentified YSO candidates. The discovery of these new YSOs is an important addition to existing YSO samples and will deepen our understanding of star formation and evolution. Moreover we provided a classification catalog for the whole XMM-WISE-SDSS sample.
Read full abstract