Abstract
Redshift measurement of active galactic nuclei (AGNs) remains a time-consuming and challenging task, as it requires follow up spectroscopic observations and detailed analysis. Hence, there exists an urgent requirement for alternative redshift estimation techniques. The use of machine learning (ML) for this purpose has been growing over the last few years, primarily due to the availability of large-scale galactic surveys. However, due to observational errors, a significant fraction of these data sets often have missing entries, rendering that fraction unusable for ML regression applications. In this study, we demonstrate the performance of an imputation technique called Multivariate Imputation by Chained Equations (MICE), which rectifies the issue of missing data entries by imputing them using the available information in the catalog. We use the Fermi-LAT Fourth Data Release Catalog (4LAC) and impute 24% of the catalog. Subsequently, we follow the methodology described in Dainotti et al. (ApJ, 2021, 920, 118) and create an ML model for estimating the redshift of 4LAC AGNs. We present results which highlight positive impact of MICE imputation technique on the machine learning models performance and obtained redshift estimation accuracy.
Highlights
Spectroscopic redshift measurement of Active Galactic Nuclei (AGNs) is a highly time-consuming operation and is a strong limiting factor for a large-scale extragalactic surveys
To ensure the best possible imputations we use all 1897 AGNs which remain after the removal of outliers and non-BL Lacertae (BLL) and non-Flat Spectrum Radio Quasars (FSRQ) AGNs
As can be discerned from the plots (Figure 4), the Multivariate Imputation by Chained Equations (MICE) imputations are following the underlying distribution for the three predictors, and we confidently incorporate them into our analysis
Summary
Spectroscopic redshift measurement of Active Galactic Nuclei (AGNs) is a highly time-consuming operation and is a strong limiting factor for a large-scale extragalactic surveys. One technique that has gained significant momentum is the use of machine learning (ML) to determine the photometric redshift of AGNs Brescia et al (2013), Brescia et al (2019); Dainotti et al (2021); Nakoneczny et al (2019); Jones and Singal (2017); Cavuoti et al (2014); Fotopoulou and Paltani (2018); Logan and Fotopoulou (2020); Yang et al (2017); Zhang et al (2019); Curran (2020); Nakoneczny et al (2019); Pasquet-Itam and Pasquet (2018); Jones and Singal (2017). Large AGN data sets derived from all-sky surveys like the Wide-field Infrared Survey Explorer (WISE) Brescia et al (2019); Ilbert et al (2008); Hildebrandt et al (2010); Brescia et al (2013); Wright et al (2010); D’Isanto and Polsterer (2018) and Sloan Digital Sky Survey (SDSS) Aihara et al (2011) have played a significant role in the proliferation of ML approaches. Almost all of these large data sets suffer from the issue of missing entries, which can lead to a considerable portion of the data being discarded
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.