Robust statistical tools for identifying multiple stellar populations in globular clusters in the presence of measurement errors

G Valle,M Dell’Omodarme,E Tognelli

doi:10.1051/0004-6361/202142454

Abstract

Context.The finding of multiple stellar populations (MPs), which are defined by patterns in the stellar element abundances, is considered today a distinctive feature of globular clusters. However, while data availability and quality have improved in the past decades, this is not always true for the techniques that are adopted to analyse them, which creates problems of objectivity for the claims and reproducibility.Aims.Using NGC 2808 as test case, we show the use of well-established statistical clustering methods. We focus our analysis on the red giant branch phase, where two data sets are available in the recent literature for low- and high-resolution spectroscopy.Methods.We adopted hierarchical clustering and partition methods. We explicitly addressed the usually neglected problem of measurement errors, for which we relied on techniques that were recently introduced in the statistical literature. The results of the clustering algorithms were subjected to a silhouette width analysis to compare the performance of the split into different numbers of MPs.Results.For both data sets the results of the statistical pipeline are at odds with those reported in the literature. Two MPs are detected for both data sets, while the literature reports five and four MPs from high- and low-resolution spectroscopy, respectively. The silhouette analysis suggests that the population substructure is reliable for high-resolution spectroscopy data, while the actual existence of MP is questionable for the low-resolution spectroscopy data. The discrepancy with literature claims can be explained with the different methods that were adopted to characterise MPs. By means of Monte Carlo simulations and multimodality statistical tests, we show that the often adopted study of the histogram of the differences in some key elements is prone to multiple false-positive findings.Conclusions.The adoption of statistically grounded methods, which adopt all the available information to split the data into subsets and explicitly address the problem of data uncertainty, is of paramount importance to present more robust and reproducible research.

Full Text