Single-cell omics techniques have made it possible to analyze individual cells in biological samples, providing us with a more detailed understanding of cellular heterogeneity and biological systems. Accurate identification of cell types is critical for single-cell RNA sequencing (scRNA-seq) analysis. However, scRNA-seq data are usually high dimensional and sparse, posing a great challenge to analyze scRNA-seq data. Existing cell-type annotation methods are either constrained in modeling scRNA-seq data or lack consideration of long-term dependencies of characterized genes. In this work, we developed a Transformer-based deep learning method, scSwinFormer, for the cell-type annotation of large-scale scRNA-seq data. Sequence modeling of scRNA-seq data is performed using the smooth gene embedding module, and then, the potential dependencies of genes are captured by the self-attention module. Subsequently, the global information inherent in scRNA-seq data is synthesized using the Cell Token, thereby facilitating accurate cell-type annotation. We evaluated the performance of our model against current state-of-the-art scRNA-seq cell-type annotation methods on multiple real data sets. ScSwinFormer outperforms the current state-of-the-art scRNA-seq cell-type annotation methods in both external and benchmark data set experiments.
Read full abstract