With the increasing amount of astronomical observation data, it is an inevitable trend to use artificial intelligence methods for automatic analysis and identification of light curves for full samples. However, data sets covering all known classes of variable stars that meet all research needs are not yet available. There is still a lack of standard training data sets specifically designed for any type of light-curve classification, but existing light-curve training sets or data sets cannot be directly merged into a large collection. Based on the open data sets of the All-Sky Automated Survey for SuperNovae, Gaia, and Zwicky Transient Facility, we construct a compatible light-curve data set named LEAVES for automated recognition of variable stars, which can be used for training and testing new classification algorithms. The data set contains a total of 977,953 variable and 134,592 nonvariable light curves, in which the supported variables are divided into six superclasses and nine subclasses. We validate the compatibility of the data set through experiments and employ it to train a hierarchical random forest classifier, which achieves a weighted average F1-score of 0.95 for seven-class classification and 0.93 for 10-class classification. Experimental results prove that the classifier is more compatible than the classifier established based on a single band and a single survey, and has wider applicability while ensuring classification accuracy, which means it can be directly applied to different data types with only a relatively small loss in performance compared to a dedicated model.
Read full abstract