Abstract

Ground-based optical surveys such as PanSTARRS, DES, and LSST, will produce large catalogs to limiting magnitudes of r > 24. Star-galaxy separation poses a major challenge to such surveys because galaxies---even very compact galaxies---outnumber halo stars at these depths. We investigate photometric classification techniques on stars and galaxies with intrinsic FWHM < 0.2 arcsec. We consider unsupervised spectral energy distribution template fitting and supervised, data-driven Support Vector Machines (SVM). For template fitting, we use a Maximum Likelihood (ML) method and a new Hierarchical Bayesian (HB) method, which learns the prior distribution of template probabilities from the data. SVM requires training data to classify unknown sources; ML and HB don't. We consider i.) a best-case scenario (SVM_best) where the training data is (unrealistically) a random sampling of the data in both signal-to-noise and demographics, and ii.) a more realistic scenario where training is done on higher signal-to-noise data (SVM_real) at brighter apparent magnitudes. Testing with COSMOS ugriz data we find that HB outperforms ML, delivering ~80% completeness, with purity of ~60-90% for both stars and galaxies, respectively. We find no algorithm delivers perfect performance, and that studies of metal-poor main-sequence turnoff stars may be challenged by poor star-galaxy separation. Using the Receiver Operating Characteristic curve, we find a best-to-worst ranking of SVM_best, HB, ML, and SVM_real. We conclude, therefore, that a well trained SVM will outperform template-fitting methods. However, a normally trained SVM performs worse. Thus, Hierarchical Bayesian template fitting may prove to be the optimal classification method in future surveys.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call