Abstract

Abstract Background: Single-cell RNA sequencing (scRNAseq) enables gene expression profiling of individual cells, thus providing new opportunities to identify and characterize cell types and tumor cell states, and which is typically accomplished through some type of clustering analysis. Determination of the optimal number of groups found is a crucial step in data interpretation, and one that is often ignored. Hence, there is still a great need to develop objective tools for estimation of the data driven optimal number of groups/clusters. Methods: We developed “MultiK”, a data-driven tool for objective selection of the optimal number of groups/clusters, which combines multiple resolution solutions together through a consensus clustering approach based upon repeated sub-sampling. MultiK gives multiple diagnostic plots to highlight the number of meaningful groups in the data, and makes objective group number suggestions, which encompasses both high and low-resolution parameters. Results: MultiK successfully identified the ground truth number of groups in a controlled data set of a mixture of 3 breast cell lines, and was sensitive in identification of classes and subclasses in a synthetic “spike in” experiment. We further applied MultiK to identify reproducible groups in complex tissue datasets, including mouse mammary glands and multiple T cell data sets. In both cases, we identified most of the previously known subsets/cell populations and did so without any prior knowledge needed. In the human T cell case, MultiK identified a total of 12 reproducible T cell subsets spanning 6 different data sets that represent multiple cancer types. Moreover, consistent with previous findings, some of these reproducible T cell signatures showed prognostic values in predicting breast cancer patient’s survival including Treg, and multiple CD8 subsets. In particular, we found that the CD4 T naïve signature was significantly associated with overall survival in multiple patient sets, including both HER2+ and TNBCs. We also found that the two CD4 T follicular helper subsets significantly correlated with survival in both HER2+ and TNBC samples. Furthermore, consistent with the previous finding that the CD8 Trm signature associated with good prognosis, we found that our CD8T resident memory signature was prognostic within the HER2+ and TNBC sets. Conclusion: MultiK improves current scRNAseq cluster/group number estimations using an objective data driven approach. This methodology should be important as using our T cell analyses as an example, it shows that many previous published analyses likely overestimated the true number of reproducible T cell subsets in the tumor immune microenvironment, which may lead to irreproducible findings across studies. Additional analyses on tumor cell subsets are also currently underway. Citation Format: Siyao Liu, Aatish Thennavan, J.S. Marron, Charles Perou. An automated tool to determine optimal cluster numbers in single-cell RNA sequencing data identifies key prognostic subsets of T cells in breast tumors [abstract]. In: Proceedings of the 2021 San Antonio Breast Cancer Symposium; 2021 Dec 7-10; San Antonio, TX. Philadelphia (PA): AACR; Cancer Res 2022;82(4 Suppl):Abstract nr P3-09-08.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.