Abstract

Abstract High parameter flow cytometry is a highly utilized tool for accurate immunophenotyping and diagnostics in immunology, oncology, and many other disciplines. Over the past decade, several methods have been developed for automated clustering and dimensionality reduction of high parameter flow cytometry data which has sped up and simplified the discovery of cell populations not observed by manual gating strategies. However, the input and output of such tools are stochastic in nature, thus making their results difficult to reuse with novel samples. To address this challenge, we present a method utilizing machine learning to predict cluster labels and dimensionality reduction coordinates on novel samples. For proof of principle, we utilized high parameter (22 marker) flow cytometry data which examines myeloid hematopoiesis on bone marrow aspirate samples. Training data for machine learning consisted of pooled, normal, bone marrow. Phenograph was used for initial population clustering and UMAP for dimensionality reduction. A random forest model was found to be most accurate in predicting Phenograph clusters (98%) on novel data while a k-nearest neighbors (knn) model was found most accurate for UMAP coordinate prediction. The utility of this model was observed by examining acute myeloid leukemia with aberrant immunophenotypes, which were not included in the training data set. For these samples, predicted clusters and UMAP coordinates correlated with early progenitor populations from the training set. This approach allows for a de novo sample clustering and dimensionality reduction assignment onto an already established and characterized model, allowing for rapid, automated interpretation of high dimensional flow cytometry data. Supported by Cedars-Sinai Pathology and Laboratory Medicine Minigrant

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call