Context. The study of active galactic nuclei (AGNs) is fundamental to discern the formation and growth of supermassive black holes (SMBHs) and their connection with star formation and galaxy evolution. Due to the significant kinetic and radiative energy emitted by powerful AGNs, they are prime candidates to observe the interplay between SMBH and stellar growth in galaxies. Aims. We aim to develop a method to predict the AGN nature of a source, its radio detectability, and redshift purely based on photometry. The use of such a method will increase the number of radio AGNs, allowing us to improve our knowledge of accretion power into an SMBH, the origin and triggers of radio emission, and its impact on galaxy evolution. Methods. We developed and trained a pipeline of three machine learning (ML) models than can predict which sources are more likely to be an AGN and to be detected in specific radio surveys. Also, it can estimate redshift values for predicted radio-detectable AGNs. These models, which combine predictions from tree-based and gradient-boosting algorithms, have been trained with multi-wavelength data from near-infrared-selected sources in the Hobby-Eberly Telescope Dark Energy Experiment (HETDEX) Spring field. Training, testing, calibration, and validation were carried out in the HETDEX field. Further validation was performed on near-infrared-selected sources in the Stripe 82 field. Results. In the HETDEX validation subset, our pipeline recovers 96% of the initially labelled AGNs and, from AGNs candidates, we recover 50% of previously detected radio sources. For Stripe 82, these numbers are 94% and 55%. Compared to random selection, these rates are two and four times better for HETDEX, and 1.2 and 12 times better for Stripe 82. The pipeline can also recover the redshift distribution of these sources with σNMAD = 0.07 for HETDEX (σNMAD = 0.09 for Stripe 82) and an outlier fraction of 19% (25% for Stripe 82), compatible with previous results based on broad-band photometry. Feature importance analysis stresses the relevance of near- and mid-infrared colours to select AGNs and identify their radio and redshift nature. Conclusions. Combining different algorithms in ML models shows an improvement in the prediction power of our pipeline over a random selection of sources. Tree-based ML models (in contrast to deep learning techniques) facilitate the analysis of the impact that features have on the predictions. This prediction can give insight into the potential physical interplay between the properties of radio AGNs (e.g. mass of black hole and accretion rate).
Read full abstract