Using Multivariate Imputation by Chained Equations to Predict Redshifts of Active Galactic Nuclei

Spencer James Gibson,Agnieszka Pollo,Enrico Rinaldi,Malgorzata Bogdan,Artem Poliszczuk,Ioannis Liodakis,Maria Giovanna Dainotti,Aditya Narendra

doi:10.3389/fspas.2022.836215

Abstract

Redshift measurement of active galactic nuclei (AGNs) remains a time-consuming and challenging task, as it requires follow up spectroscopic observations and detailed analysis. Hence, there exists an urgent requirement for alternative redshift estimation techniques. The use of machine learning (ML) for this purpose has been growing over the last few years, primarily due to the availability of large-scale galactic surveys. However, due to observational errors, a significant fraction of these data sets often have missing entries, rendering that fraction unusable for ML regression applications. In this study, we demonstrate the performance of an imputation technique called Multivariate Imputation by Chained Equations (MICE), which rectifies the issue of missing data entries by imputing them using the available information in the catalog. We use the Fermi-LAT Fourth Data Release Catalog (4LAC) and impute 24% of the catalog. Subsequently, we follow the methodology described in Dainotti et al. (ApJ, 2021, 920, 118) and create an ML model for estimating the redshift of 4LAC AGNs. We present results which highlight positive impact of MICE imputation technique on the machine learning models performance and obtained redshift estimation accuracy.

Highlights

Spectroscopic redshift measurement of Active Galactic Nuclei (AGNs) is a highly time-consuming operation and is a strong limiting factor for a large-scale extragalactic surveys
To ensure the best possible imputations we use all 1897 AGNs which remain after the removal of outliers and non-BL Lacertae (BLL) and non-Flat Spectrum Radio Quasars (FSRQ) AGNs
As can be discerned from the plots (Figure 4), the Multivariate Imputation by Chained Equations (MICE) imputations are following the underlying distribution for the three predictors, and we confidently incorporate them into our analysis

Summary

Introduction

Spectroscopic redshift measurement of Active Galactic Nuclei (AGNs) is a highly time-consuming operation and is a strong limiting factor for a large-scale extragalactic surveys. One technique that has gained significant momentum is the use of machine learning (ML) to determine the photometric redshift of AGNs Brescia et al (2013), Brescia et al (2019); Dainotti et al (2021); Nakoneczny et al (2019); Jones and Singal (2017); Cavuoti et al (2014); Fotopoulou and Paltani (2018); Logan and Fotopoulou (2020); Yang et al (2017); Zhang et al (2019); Curran (2020); Nakoneczny et al (2019); Pasquet-Itam and Pasquet (2018); Jones and Singal (2017). Large AGN data sets derived from all-sky surveys like the Wide-field Infrared Survey Explorer (WISE) Brescia et al (2019); Ilbert et al (2008); Hildebrandt et al (2010); Brescia et al (2013); Wright et al (2010); D’Isanto and Polsterer (2018) and Sloan Digital Sky Survey (SDSS) Aihara et al (2011) have played a significant role in the proliferation of ML approaches. Almost all of these large data sets suffer from the issue of missing entries, which can lead to a considerable portion of the data being discarded

Objectives

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Frontiers in Astronomy and Space Sciences	Publication Date: Mar 4, 2022
Citations: 4	License type: CC BY 4.0

R Discovery Prime

R Discovery Prime

Using Multivariate Imputation by Chained Equations to Predict Redshifts of Active Galactic Nuclei

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Astronomy and Space Sciences

Lead the way for us

Similar Papers

Unbiased Identification of Thrombosis Risk Factors in Polycythemia Vera (PV) Using Machine Learning and Rich Data from Automated Extraction of Medical Records Generates Dynamic Models Highly Predictive for Thrombosis in PV
Ghaith Abu-Zeinah ... Joseph Scandura
Blood | VOL. 140
Ghaith Abu-Zeinah, et. al.Ghaith Abu-Zeinah ... Joseph Scandura
15 Nov 2022
Blood | VOL. 140

A Comparison of Multiple Imputation Methods for Data with Missing Values
Geeta Chhabra ... Jayanthi Ranjan
Indian Journal of Science and Technology | VOL. 10
Geeta Chhabra, et. al.Geeta Chhabra ... Jayanthi Ranjan
18 May 2017
Indian Journal of Science and Technology | VOL. 10

Comparison of Machine Learning Approaches for Missing Data Imputation Among Non-Small Cell Lung Cancer Patients
D.X Yang ... S Aneja
International Journal of Radiation Oncology*Biology*Physics | VOL. 111
D.X Yang, et. al.D.X Yang ... S Aneja
22 Oct 2021
International Journal of Radiation Oncology*Biology*Physics | VOL. 111

Comparison of Single and MICE Imputation Methods for Missing Values: A Simulation Study
Nurul Azifah Mohd Pauzi ... Yap Bee Wah
Pertanika Journal of Science and Technology | VOL. 29
Nurul Azifah Mohd Pauzi, et. al.Nurul Azifah Mohd Pauzi ... Yap Bee Wah
30 Apr 2021
Pertanika Journal of Science and Technology | VOL. 29

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Using Multivariate Imputation by Chained Equations to Predict Redshifts of Active Galactic Nuclei

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Frontiers in Astronomy and Space Sciences