Abstract

ABSTRACT Machine-learning based classifiers have become indispensable in the field of astrophysics, allowing separation of astronomical sources into various classes, with computational efficiency suitable for application to the enormous data volumes that wide-area surveys now typically produce. In the standard supervised classification paradigm, a model is typically trained and validated using data from relatively small areas of sky, before being used to classify sources in other areas of the sky. However, population shifts between the training examples and the sources to be classified can lead to ‘silent’ degradation in model performance, which can be challenging to identify when the ground-truth is not available. In this letter, we present a novel methodology using the nannyml Confidence-Based Performance Estimation (CBPE) method to predict classifier F1-score in the presence of population shifts, but without ground-truth labels. We apply CBPE to the selection of quasars with decision-tree ensemble models, using broad-band photometry, and show that the F1-scores are predicted remarkably well (${\rm MAPE} \sim 10{{\ \rm per\ cent}}$; R2 = 0.74–0.92). We discuss potential use-cases in the domain of astronomy, including machine-learning model and/or hyperparameter selection, and evaluation of the suitability of training data sets for a particular classification problem.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.