Introduction We have established an international collaboration between the DKMS (Germany) and the National Marrow Donor Program (NMDP; USA) to characterize more than 2000 donor peripheral blood stem cell (PBSC) grafts. We aim to correlate graft immunophenotype with patient outcomes and perform additional analyses on biobanked aliquots of selected samples to deepen our understanding of the relationships between features of the donor graft and key patient outcomes, such as graft-versus-host disease, relapse, and overall survival. Here, we present an interim analysis, focused on the CD8 T cell compartment, describing relationships between donor characteristics and graft composition. We have developed a novel computational clustering method to integrate cytometry data from both the US and German analytical laboratories. Methods 300 donor PBSC graft samples were collected, processed, freshly stained (34-color panel for immunophenotyping of lymphocytes and hematopoietic stem cells), and analyzed using an Aurora Cytek instrument at the NMDP contract laboratory (Roswell Park, Buffalo, NY). 53 additional samples were processed in the DKMS laboratory in Dresden, Germany (equivalent antibody panel and instrument). We have developed a novel analytical pipeline for integrating these data. First, we group samples by facility, and by month run for computational convenience, to form “batches” for initial clustering. We perform Leiden clustering, and subclustering, to generate a large number of clusters per batch. To integrate data from all batches, we then cluster all subcluster centroids from all batch-specific clusterings. This gives a global clustering for each stream ( e.g. the CD8+ T cells). Cluster frequencies per sample provide features for quantifying links between immunophenotype and donor metadata ( e.g., CMV serostatus, age, and sex). Our approach offers an important complement to traditional gating and allows full exploration of the data in an unbiased fashion. Results 353 donor products were included in the analysis. The median donor age was 28 years (range 18 - 60), 59% were female and 65% were CMV positive. Initial metaclustering demonstrated that the 53 DKMS samples and the 300 NMDP samples are each well represented in all clusters, and that site-driven batch effects, while present, are mild. We initially focused on the CD8 T cell compartment (UMAP in Fig 1A) and discovered several phenotypic differences when we compared CMV seronegative and seropositive donors. CD8 T cell clusters 10 and 11 (both TCRαβ+ CD45RA+ CCR7+ HLA-DR+ CD57+; cluster 10 CCR9+; cluster 11 CCR9-) were present at higher frequency in CMV+ donors (n = 116) compared with CMV- donors (n = 220; p <0.0001; Fig 1B). Conversely, clusters 9 and 12, enriched for mucosal-associated invariant T (MAIT) cells, were significantly reduced in CMV+ donors (p <0.001). Of note, CMV serostatus was not significantly associated with age (median age in the CMV+ group was 29.4 years; vs 28.4 years in CMV- donors). We also observed an association between cluster 8 (TCRαβ+ CD45RA+ CCR7+ HLA-DR+ CD38+; consistent with an activated phenotype) and age (p = 0.0009; Fig1B). There was a trend toward decreased MAIT frequencies with age (Cluster 9; p = 0.058). When we analyzed the MAIT cell frequency as defined by traditional flow cytometry gating (CD3+ CD8+ CD161+ Vα7.2+), we observed a statistically significant relationship between younger donor age and increased MAIT cell frequency (p = 0.0002), consistent with the trend seen in cluster 9. Sex at birth was not linked with immunophenotype in our analyses thus far. Conclusions In this initial analysis of our multi-center observational study of PBSC graft products, we demonstrate the development and application of new computational tools for large-cohort flow cytometric data analysis. CMV-positive serostatus and younger donor age are both associated with an increased frequency of MAIT cells in the graft. We also found differences in CD8 T cell activation status in association with age and CMV. In the future, we will assess the relationships between these populations and patient outcomes, as well as explore cluster frequencies in association with key patient outcomes including GVHD, relapse, and overall survival. Sample collection and analysis are ongoing at both US and German sites as an ongoing collaborative effort between the NMDP and the DKMS.