Abstract

Part of the flow/mass cytometry data analysis process is aligning (matching) cell subsets between relevant samples. Current methods address this cluster-matching problem in ways that are either computationally expensive, affected by the curse of dimensionality, or fail when population patterns significantly vary between samples. Here, we introduce a quadratic form (QF)-based cluster matching algorithm (QFMatch) that is computationally efficient and accommodates cases where population locations differ significantly (or even disappear or appear) from sample to sample. We demonstrate the effectiveness of QFMatch by evaluating sample datasets from immunology studies. The algorithm is based on a novel multivariate extension of the quadratic form distance for the comparison of flow cytometry data sets. We show that this QF distance has attractive computational and statistical properties that make it well suited for analysis tasks that involve the comparison of flow/mass cytometry samples.

Highlights

  • Most flow and mass cytometry applications in biomedical studies are based on comparisons between/among control and test samples

  • We discuss the limitations of currently available methods for cluster matching applications, and demonstrate that employing a multivariate extension of the quadratic form distance[3] overcomes key limitations

  • To pave the way toward a more robust solution of this problem, we developed QFMatch - a cluster matching method based on the quadratic form (QF) distance measure

Read more

Summary

Introduction

Most flow and mass cytometry applications in biomedical studies are based on comparisons between/among control and test samples. The need to facilitate these analyses, and make them more accurate, has motivated development of automated clustering and cluster matching methods for Hi-D flow and mass cytometry data Both of these tasks (cluster identification and cluster matching) are highly challenging because they are subject to the “curse of dimensionality”, a well-known statistical problem for Hi-D data that compromises both statistical validity and computational performance[1,2]. The first way is clustering one sample at a time and aligning/matching the cell subsets (clusters) present in multiple samples postclustering (e.g., as is done in the FLAME analysis[4] and flowMatch package[5]) This conventional approach allows fast computational implementations in low dimensions. The second approach (e.g., Joint Clustering and Matching[6], ASPIRE7) alleviates some of these problems by creating a Hi-D template of meta-clusters (distinct biologically-relevant cell types) in which all sample data are pooled, simultaneously clustered and matched

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call