Abstract
Part of the flow/mass cytometry data analysis process is aligning (matching) cell subsets between relevant samples. Current methods address this cluster-matching problem in ways that are either computationally expensive, affected by the curse of dimensionality, or fail when population patterns significantly vary between samples. Here, we introduce a quadratic form (QF)-based cluster matching algorithm (QFMatch) that is computationally efficient and accommodates cases where population locations differ significantly (or even disappear or appear) from sample to sample. We demonstrate the effectiveness of QFMatch by evaluating sample datasets from immunology studies. The algorithm is based on a novel multivariate extension of the quadratic form distance for the comparison of flow cytometry data sets. We show that this QF distance has attractive computational and statistical properties that make it well suited for analysis tasks that involve the comparison of flow/mass cytometry samples.
Highlights
Most flow and mass cytometry applications in biomedical studies are based on comparisons between/among control and test samples
We discuss the limitations of currently available methods for cluster matching applications, and demonstrate that employing a multivariate extension of the quadratic form distance[3] overcomes key limitations
To pave the way toward a more robust solution of this problem, we developed QFMatch - a cluster matching method based on the quadratic form (QF) distance measure
Summary
Most flow and mass cytometry applications in biomedical studies are based on comparisons between/among control and test samples. The need to facilitate these analyses, and make them more accurate, has motivated development of automated clustering and cluster matching methods for Hi-D flow and mass cytometry data Both of these tasks (cluster identification and cluster matching) are highly challenging because they are subject to the “curse of dimensionality”, a well-known statistical problem for Hi-D data that compromises both statistical validity and computational performance[1,2]. The first way is clustering one sample at a time and aligning/matching the cell subsets (clusters) present in multiple samples postclustering (e.g., as is done in the FLAME analysis[4] and flowMatch package[5]) This conventional approach allows fast computational implementations in low dimensions. The second approach (e.g., Joint Clustering and Matching[6], ASPIRE7) alleviates some of these problems by creating a Hi-D template of meta-clusters (distinct biologically-relevant cell types) in which all sample data are pooled, simultaneously clustered and matched
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.