Single-cell RNA sequencing (scRNA-seq) analysis offers tremendous potential for addressing various biological questions, with one key application being the annotation of query datasets with unknown cell types using well-annotated external reference datasets. However, the performance of existing supervised or semi-supervised methods largely depends on the quality of source data. Furthermore, these methods often struggle with the batch effects arising from different platforms when handling multiple reference or query datasets, making precise annotation challenging. We developed transCAE, a robust transfer learning-based algorithm for single-cell annotation that integrates unsupervised dimensionality reduction with supervised cell type classification. This approach fully leverages information from both reference and query datasets to achieve precise cell classification within the query data. Extensive evaluations show that transCAE significantly enhances classification accuracy and efficiently mitigates batch effects. Compared to other state-of-the-art methods, transCAE demonstrates superior performance in experiments involving multiple reference or query datasets. These strengths position transCAE as an optimal annotation method for scRNA-seq datasets.
Read full abstract