Abstract

With ever-increasing numbers of astrophysical transient surveys, new facilities and archives of astronomical time series, time domain astronomy is emerging as a mainstream discipline. However, the sheer volume of data alone - hundreds of observations for hundreds of millions of sources – necessitates advanced statistical and machine learning methodologies for scientific discovery: characterization, categorization, and classification. Whilst these techniques are slowly entering the astronomer’s toolkit, their application to astronomical problems is not without its issues. In this paper, we will review some of the challenges posed by trying to identify variable stars in large data collections, including appropriate feature representations, dealing with uncertainties, establishing ground truths, and simple discrete classes.

Highlights

  • Time domain astronomy is entering a golden age with an ever-increasing number of new instruments and facilities dedicated to repeated observations of large swathes of sky every few nights or so

  • The success of automated classification relies on having access to the greatest amount of information and in this era of data-intensive astronomy, there is no lack of science exploration and discovery to do

  • We found that no single algorithm was generally better than ∼ 60% accurate across the full data set

Read more

Summary

Introduction

Time domain astronomy is entering a golden age with an ever-increasing number of new instruments and facilities dedicated to repeated observations of large swathes of sky every few nights or so. Even though many of these are dedicated to looking for real-time transients (things changing significantly from past observations, if any), they can quickly generate substantial archives of data Systematic searches of these for particular types of astrophysical source or phenomena require new approaches and in recent years, there have been a number of studies attempting automated classification and outlier detection using machine learning-based techniques (see Table 1). These are all with an eye to the generation of synoptic sky surveys, e.g., Gaia ([1]), ZTF ([2]), and LSST ([3]), which will increase the amount of available data by several orders of magnitude and mandate such approaches. Survey ASAS CoRoT Kepler Hipparcos OGLE-II Stripe 82 LINEAR CRTS VVV EROS-2 WISE

V ZY JHKS BR IR
How to automatedly classify a data set
Characterizing astronomical time series
Unstated assumptions
Not all features are equal
The most important feature: period
Are we using the best features
Which classifier?
Dealing with uncertainties
Establishing ground truths
Class distinctions
Extremes
Findings
Summary and future work

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.