Abstract

Dimension-reduction techniques can greatly improve statistical inference in astronomy. A standard approach is to use Principal Components Analysis (PCA). In this work we apply a recently-developed technique, diffusion maps, to astronomical spectra for data parameterization and dimensionality reduction, and develop a robust, eigenmode-based framework for regression. We show how our framework provides a computationally efficient means by which to predict redshifts of galaxies, and thus could inform more expensive redshift estimators such as template cross-correlation. It also provides a natural means by which to identify outliers (e.g., misclassified spectra, spectra with anomalous features). We analyze 3835 SDSS spectra and show how our framework yields a more than 95% reduction in dimensionality. Finally, we show that the prediction error of the diffusion map-based regression approach is markedly smaller than that of a similar approach based on PCA, clearly demonstrating the superiority of diffusion maps over PCA for this regression task.

Highlights

  • Galaxy spectra are classic examples of high-dimensional data, with thousands of measured fluxes providing information about the physical conditions of the observed object

  • In this work we present a unified framework for regression and data parameterization of astronomical spectra

  • We show that for the types of high-dimensional and complex data sets often analyzed in the astronomy, diffusion map can yield far superior results than commonly-used methods such as Principal Component Analysis (PCA)

Read more

Summary

Introduction

Galaxy spectra are classic examples of high-dimensional data, with thousands of measured fluxes providing information about the physical conditions of the observed object. We introduce the diffusion map framework (see, e.g., Coifman & Lafon 2006, Lafon & Lee 2006) to astronomy, comparing and contrasting it with PCA for regression analysis of SDSS galaxy spectra. The diffusion map approach, on the other hand, is non-linear and instead retains distances that reflect the (local) connectivity of the data This method is robust to outliers and is often able to unravel the intrinsic geometry and the natural (non-linear) coordinates of the data. Our PCA- and diffusion-map-based approaches provide a fast and statistically rigorous means of identifying outliers in redshift data.

Diffusion Maps and Data Parameterization
Adaptive Regression Using Orthogonal Eigenfunctions
Risk: Theory and Estimation
Redshift Prediction Using SDSS Spectra
Data Preparation
Analysis
Comparison With Other Methods
Summary
Principal Components Analysis
Findings
Prediction Intervals for Spectroscopic Redshift Estimates
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.