Pathway analysis is a crucial analytical phase in disease research on single-cell RNA sequencing (scRNA-seq) data, offering biological interpretations based on prior knowledge. However, currently available tools for generating cell-level pathway activity scores (PAS) exhibit computational inefficacy in large-scale scRNA-seq datasets. Additionally, disease-related pathways are often identified through cross-condition comparisons within specific cell types, overlooking potential patterns that involve multiple cell types. Here, we present single-cell pathway activity factor analysis (scPAFA), a Python library designed for large-scale single-cell datasets allowing rapid PAS computation and uncovering biologically interpretable disease-related multicellular pathway modules, which are low-dimensional representations of disease-related PAS alterations in multiple cell types. Application on colorectal cancer (CRC) datasets and large-scale lupus atlas over 1.2 million cells demonstrated that scPAFA can achieve over 40-fold reductions in the runtime of PAS computation and further identified reliable and interpretable multicellular pathway modules that capture the heterogeneity of CRC and transcriptional abnormalities in lupus patients, respectively. Overall, scPAFA presents a valuable addition to existing research tools in disease research, with the potential to reveal complex disease mechanisms and support biomarker discovery at the pathway level.
Read full abstract