Abstract Single-nucleus joint ATAC- and RNA-sequencing (snMultiome) can be used to identify functionally divergent cell subpopulations based on their transcriptomic and epigenetic profiles within complex samples. Accurate cell type annotation is critical to successful snMultiome data analysis. Several computational methods have been developed for automatic annotation. Traditional cell type annotation methods initially cluster cells using unsupervised learning methods based on the gene expression profiles, then label the clusters using aggregated cluster-level expression profiles and marker genes. These methods rely heavily on the clustering results. As the purity of clusters cannot be guaranteed, false detection of cluster features may lead to incorrect annotations. Further, canonical cell surface markers may not always be suitable to be applied in single-nucleus RNA-seq studies because single-nucleus RNA-seq generally yields lower detected transcript numbers compared to typical single-cell RNA-seq. Moreover, cell type marker genes in the snRNA-seq data may differ from the ones obtained with scRNA-seq data, reflecting biological differences in the cytoplasmic and nuclear RNA pools. Lastly, the data obtained from malignant cells are best left out in establishing cell type reference data because they are too heterogeneous and patient-specific. Reference-based automated algorithms such as SingleR enable quick and unbiased classifications by leveraging a collection of built-in reference data sets for human (e.g. Human Primary Cell Atlas (microarray-based) and the combined Blueprint Epigenomics and Encode data set (RNA-seq-based)). Still, SingleR may return erroneous cell type classifications. Our dataset was generated using the 10x Genomics snMultiome platform to yield 296,557 nuclei from 82 frozen breast tumors, representing patients from diverse genetic ancestral background. Using these data, we sought to improve the accuracy of cell type annotation by SingleR. To achieve this, we first separated malignant and non-malignant cells based on DNA copy number aberrations (aneuploidy) through CopyKAT. For cells determined to be non-malignant, we built the custom reference from snRNA-seq data set, recently made available by The Human Breast Cell Atlas, and then applied singleR with a custom reference where each cell type is represented by single-cells of that type, allowing a well-founded estimate of the confidence with which a cell type call can be made. Using this approach, we successfully identified 11 distinct cell types for non-malignant cells, including fibroblast, adipocyte, pericyte, basal, luminal-secretory, luminal-HR, myeloid, mast, vascular, lymphatic, and T-cells, which can then be further subclassified. Furthermore, we interrogated each cluster using known canonical markers and transferred the cell type labels to snATAC-seq. This approach enabled us to link peaks to genes in each cell type. We believe this new approach that refines SingleR can greatly improve accuracy and minimize misclassification when annotating cell types in breast tumors using snMultiome data. Citation Format: Huaitian Liu, Alexandra Harris, Brittany Jenkins-Lord, Tiffany H. Dorsey, Francis Makokha, Shahin Sayed, Gretchen Gierach, Stefan Ambs. Cell type annotation using singleR with custom reference for single-nucleus multiome data derived from frozen human breast tumors [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 2 (Late-Breaking, Clinical Trial, and Invited Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(7_Suppl):Abstract nr LB240.
Read full abstract