Abstract

With growing data volumes from synoptic surveys, astronomers must become more abstracted from the discovery and introspection processes. Given the scarcity of follow-up resources, there is a particularly sharp onus on the frameworks that replace these human roles to provide accurate and well-calibrated probabilistic classification catalogs. Such catalogs inform the subsequent follow-up, allowing consumers to optimize the selection of specific sources for further study and permitting rigorous treatment of purities and efficiencies for population studies. Here, we describe a process to produce a probabilistic classification catalog of variability with machine learning from a multi-epoch photometric survey. In addition to producing accurate classifications, we show how to estimate calibrated class probabilities, and motivate the importance of probability calibration. We also introduce a methodology for feature-based anomaly detection, which allows discovery of objects in the survey that do not fit within the predefined class taxonomy. Finally, we apply these methods to sources observed by the All Sky Automated Survey (ASAS), and unveil the Machine-learned ASAS Classification Catalog (MACC), which is a 28-class probabilistic classification catalog of 50,124 ASAS sources. We estimate that MACC achieves a sub-20% classification error rate, and demonstrate that the class posterior probabilities are reasonably calibrated. MACC classifications compare favorably to the classifications of several previous domain-specific ASAS papers and to the ASAS Catalog of Variable Stars, which had classified only 24% of those sources into one of 12 science classes. The MACC is publicly available at http://www.bigmacc.info.

Highlights

  • Synoptic imaging surveys have begun to routinely collect dozens to thousands of epochs of photometric data over wide swaths of the sky

  • For each All Sky Automated Survey (ASAS) star, we find the Naval Observatory Merged Astrometric Dataset (NOMAD) source with the highest classifier probability of ‘match’, with preference of spatially closer matches when identical probabilities are returned for multiple NOMAD sources

  • Starting with the list of NOMAD sources associated with ASAS sources, our algorithm looks for a SIMBAD source which is spatially close to the NOMAD source, calling a match any SIMBAD source which is within 0.5 arcseconds of the NOMAD source

Read more

Summary

Introduction

Synoptic imaging surveys have begun to routinely collect dozens to thousands of epochs of photometric data over wide swaths of the sky. In Richards et al (2012), we introduced a methodology to overcome the debilitating effects of non-representative training sets on variable star classification, and in Long et al (2012) we devised methods to appropriately use light curve data from older surveys to classify periodic variable stars in new surveys With these advances, the accuracy of variable star classification is improving demonstrably, with cross-validated error rates approaching 15–20% on multi-class problems with different data sets (Dubath et al 2011; Richards et al 2011). We detail how to cross-match sources with external catalogs to obtain further classification features (e.g., color) and use a method to impute the values of those attributes when no match is detected We use this methodology to create a calibrated probabilistic classification catalog for a set of 50,124 sources in the All Sky Automated Survey (ASAS; Pojmanski 1997) based on its publicly available ASAS V -band light curve and colors.

ASAS Data Collection
ASAS Photometric Light Curves
Querying the Naval Observatory Merged Astrometric Dataset
So9urce a1ve0rage 1m1agnitu1d2e 13 14 15
Light Curve Feature Extraction
Novel Light-Curve Features
Correcting Eclipsing Periods
Treating Aliased Periods
Feature Importance
Training the Classifier
Calibrating Classifier Probabilities
Detecting Anomalous Objects
The Catalog
Substituting Different Class Priors
Difficult Class Boundaries
Comparison to Literature
Confident MACC Classifications missed by ACVS
Classical Cepheids
Beta Cephei
Double-Mode RR Lyrae
Orion Belt Objects
Findings
Conclusions
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call