Abstract

Context. The creation of a 3D map of the bulge using RR Lyrae (RRL) is one of the main goals of the VISTA Variables in the Via Lactea Survey (VVV) and VVV(X) surveys. The overwhelming number of sources undergoing analysis undoubtedly requires the use of automatic procedures. In this context, previous studies have introduced the use of machine learning (ML) methods for the task of variable star classification. Aims. Our goal is to develop and test an entirely automatic ML-based procedure for the identification of RRLs in the VVV Survey. This automatic procedure is meant to be used to generate reliable catalogs integrated over several tiles in the survey. Methods. Following the reconstruction of light curves, we extracted a set of period- and intensity-based features, which were already defined in previous works. Also, for the first time, we put a new subset of useful color features to use. We discuss in considerable detail all the appropriate steps needed to define our fully automatic pipeline, namely: the selection of quality measurements; sampling procedures; classifier setup, and model selection. Results. As a result, we were able to construct an ensemble classifier with an average recall of 0.48 and average precision of 0.86 over 15 tiles. We also made all our processed datasets available and we published a catalog of candidate RRLs. Conclusions. Perhaps most interestingly, from a classification perspective based on photometric broad-band data, our results indicate that color is an informative feature type of the RRL objective class that should always be considered in automatic classification methods via ML. We also argue that recall and precision in both tables and curves are high-quality metrics with regard to this highly imbalanced problem. Furthermore, we show for our VVV data-set that to have good estimates, it is important to use the original distribution more abundantly than reduced samples with an artificial balance. Finally, we show that the use of ensemble classifiers helps resolve the crucial model selection step and that most errors in the identification of RRLs are related to low-quality observations of some sources or to the increased difficulty in resolving the RRL-C type given the data.

Highlights

  • The stars of RR Lyrae (RRL) were first imaged in the last decade of the 19th century

  • We would like to note that we use the Cambridge Astronomical Survey Unit (CASU) search service to group the multi-epoch data for the construction of our band-merge catalog for CASU version 1.3 release

  • The “first” phase of the pre-reduction process is done by CASU’s VISTA Data Flow System (VDFS), which produces, among other results, the pawprint, and tile catalogs that we use as our inputs

Read more

Summary

Introduction

The stars of RR Lyrae (RRL) were first imaged in the last decade of the 19th century. Over a century after Fleming’s initial discovery and after a plethora of systematic studies, RRL have come to be known as a class of multi-mode pulsating population II stars with spectral (color) type A or F that reside in the so-called “horizontal branch” of the C-M diagram They are considered excellent standard candles with periodic light curves (LC) and characteristic features that are sensitive to the chemical abundance of the primordial fossil gas from which they formed Smith (2004). In the most relevant antecedent to our work, Elorrieta et al (2016) described the use of ML methods in the search of RRLs in the VVV They consider the complete development of the classifier, starting from the collected data (2265 RRLs and 15 436 other variables), the extraction of 32 features and the evaluation of several classifiers.

The VVV survey and data reduction
Data for this work
Feature extraction: from light-curves to features
Proximity cross-matching
Date–time correction
Period
Tagging: positive and negative samples
Features extraction
Evaluation of feature subsets
Model selection
Sampling size and class imbalance
Setting the final classifier
Analysis of misidentifications
Data release
VizieR Online Data Catalog

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.