Context. Hot subdwarf stars are compact blue evolved objects, burning helium in their cores surrounded by a tiny hydrogen envelope. In the Hertzsprung-Russell Diagram they are located by the blue end of the Horizontal Branch. Most models agree on a quite probable common envelope binary evolution scenario in the Red Giant phase. However, the current binarity rate for these objects is yet unsolved, but key, question in this field. Aims. This study aims to develop a novel classification method for identifying hot subdwarf binaries within large datasets using Artificial Intelligence techniques and data from the third Gaia data release (GDR3). The results will be compared with those obtained previously using Virtual Observatory techniques on coincident samples. Methods. The methods used for hot subdwarf binary classification include supervised and unsupervised machine learning techniques. Specifically, we have used Support Vector Machines (SVM) to classify 3084 hot subdwarf stars based on their colour-magnitude properties. Among these, 2815 objects have Gaia DR3 BP/RP spectra, which were classified using Self-Organizing Maps (SOM) and Convolutional Neural Networks (CNN). In order to ensure spectral quality, previously to SOM and CNN classification, our 2815 BP/RP set were pre-analysed with two different approaches: the cosine similarity technique and the Uniform Manifold Approximation and Projection (UMAP) technique. Additional analysis onto a golden sample of 88 well-defined objects, is also presented. Results. The findings demonstrate a high agreement level (∼70–90%) with the classifications from the Virtual Observatory Sed Analyzer (VOSA) tool. This shows that the SVM, SOM, and CNN methods effectively classify sources with an accuracy comparable to human inspection or non-AI techniques. Notably, SVM in a radial basis function achieves 70.97% reproducibility for binary targets using photometry, and CNN reaches 84.94% for binary detection using spectroscopy. We also found that the single–binary differences are especially observable on the infrared flux in our Gaia DR3 BP/BR spectra, at wavelengths larger than ∼700 nm. Conclusions. We find that all the methods used are in fairly good agreement and are particularly effective to discern between single and binary systems. The agreement is also consistent with the results previously obtained with VOSA. In global terms, considering all quality metrics, CNN is the method that provides the best accuracy. The methods also appear effective for detecting peculiarities in the spectra. While promising, challenges in dealing with uncertain compositions highlight the need for caution, suggesting further research is needed to refine techniques and enhance automated classification reliability, particularly for large-scale surveys.
Read full abstract