Abstract Background Recently, there has been a growing interest in using cell differentiation inducers to produce platelets in vitro by inducing megakaryocytic differentiation in cell lines. During this process, cells undergo polyploidization through endomitosis. To measure the degree of polyploidy, cell cycle analysis by flow cytometry is commonly used. However, manual gating can be time-consuming and prone to human error, especially when analyzing large datasets. To address this issue, this study aimed to compare the performance of unsupervised machine learning algorithms for polyploidy analysis of flow cytometry data. Methods The K562 cell line (Korea Cell Line Bank, Korea) was cultured under four conditions, including 1 nM and 2.5 nM of PMA (phorbol 12-myristate 13-acetate; Sigma Aldrich, USA) and 5 μg/mL of phytosphingosine (TCI, Japan), for four days. Propidium iodide (Sigma Aldrich, USA) was used to stain the cells for ploidy analysis, and flow cytometry (Navios, Beckman Coulter Inc, USA) was used to obtain the intensity values. Twelve flow cytometry measurements were obtained for K562 cells cultured under four different conditions, with each condition tested in triplicate. Unsupervised machine learning algorithms, including K-Means, Bisecting K-Means (BKM), Mini Batch K-Means (MBKM), Agglomerative Clustering (AC), Gaussian Mixture Models (GMM), K-Medoids, and Partitioning Around Medoids (PAM), were used to generate boundaries among three clusters (2n, 4n, and ≥8n). The boundary values generated by each algorithm were comparatively analyzed by calculating the standard derivation index (SDI) on each boundary. The concordance of each algorithm with manual gating of flow cytometry data was evaluated by calculating residual errors. Results A total of 168 SDI values were obtained from seven different unsupervised machine learning algorithms for two boundaries among 2n, 4n, and ≥8n clusters of four culture conditions in triplicate. When comparing the average (±95% confidence interval) of SDI for each algorithm for the entire dataset, K-Means showed the most acceptable average SDI value of −0.07 (±0.17) with the narrowest confidence interval followed by −0.13 (±0.54) of AC, −0.26 (±0.18) of MBKM, −0.53 (±0.2) of K-Medoids, −0.58 (±0.2) of PAM, 0.74 (±0.3) of GMM, and 0.84 (±0.35) of BKM. In addition, the average (±95% confidence interval) of the absolute value of the residual errors of each algorithm was calculated, with K-Means showing the lowest value of 6.67 (±2.3), indicating higher concordance with manual gating. AC, MBKM, K-Medoids, PAM, BKM, and GMM had higher average residual errors of 10.04 (±2.5), 10.46 (±4.07), 11.13 (±4.65), 12.04 (±4.83), 12.54 (±3.0), and 12.75 (±3.5), respectively. Conclusion K-means demonstrated the best performance for clustering each ploidy among the seven unsupervised machine learning algorithms tested for the flow cytometry data. In addition, the algorithms achieved acceptable concordance with manual gating in most cases in this study. Therefore, unsupervised machine learning algorithms show promise in automating the gating of flow cytometry data for polyploidy analysis and measuring megakaryocytic differentiation in cell lines.
Read full abstract