Abstract

We present a framework for the identification of cell subpopulations in flow cytometry data based on merging mixture components using the flowClust methodology. We show that the cluster merging algorithm under our framework improves model fit and provides a better estimate of the number of distinct cell subpopulations than either Gaussian mixture models or flowClust, especially for complicated flow cytometry data distributions. Our framework allows the automated selection of the number of distinct cell subpopulations and we are able to identify cases where the algorithm fails, thus making it suitable for application in a high throughput FCM analysis pipeline. Furthermore, we demonstrate a method for summarizing complex merged cell subpopulations in a simple manner that integrates with the existing flowClust framework and enables downstream data analysis. We demonstrate the performance of our framework on simulated and real FCM data. The software is available in the flowMerge package through the Bioconductor project.

Highlights

  • Flow cytometry (FCM) can be applied in a high-throughput fashion to process thousands of samples per day

  • A more recent approach compensates for these effects by applying a data transformation during the model fitting process [6, 8]. This transformation makes data more symmetric, while the use of a multivariate t distribution allows the model to handle outliers [6, 8, 9]. These model-based gating methods effectively amount to clustering of the data and generally employ likelihoodbased measures such as the Bayesian information criterion (BIC) or Akaike information criterion (AIC) to select an appropriate model from a range of possibilities [10]

  • The simulated data sets were gated using the manual gates established on the original data for CD8+/CD4−, CD8−/CD4+, and CD8−/CD4− cell populations (Figure 6(a)). These manual gates were used to calculate misclassification rates for automated gating using the flowClustBIC, flowClustICL, flowMergeK, and GMMBIC models with the number of clusters fixed at the true number (K = 3) and with the number of clusters chosen by the optimal model

Read more

Summary

Introduction

Flow cytometry (FCM) can be applied in a high-throughput fashion to process thousands of samples per day. This transformation makes data more symmetric, while the use of a multivariate t distribution allows the model to handle outliers [6, 8, 9] These model-based gating methods effectively amount to clustering of the data and generally employ likelihoodbased measures such as the Bayesian information criterion (BIC) or Akaike information criterion (AIC) to select an appropriate model (number of clusters) from a range of possibilities [10]. Employing the cluster merging algorithm under the flowClust framework provides a better fit and a better estimate of the number of distinct cell populations for complicated flow cytometry data distributions, than either the flowClustBIC, flowClustICL, GMMBIC, or GMMICL models.

Materials and Methods
Results
Discussion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call