Abstract Background: Single-cell RNA sequencing (scRNA-seq) is a robust approach to facilitate cancer research, revealing insights into tumor heterogeneity, microenvironment, and treatment response. However, scRNA-seq results frequently encounter reproducibility challenges due to i) high data complexity, intensified by human intervention, and ii) insufficient methods standardization, leading to inconsistent findings. Methods: Here, we introduce SCRATCH (Single-Cell RnA-seq Toolkit and Pipeline for Cancer researcH), a Nextflow-based pipeline designed to improve reproducibility through a layered and modular architecture. SCRATCH follows FAIR principles and guidelines provided by the nf-core community. Result/Discussion: The pipeline provides three execution modes: end-to-end, iterative, and custom. In the end-to-end mode, the pipeline processes data from raw input to downstream analyses automatically. This mode employs ranking- and aggregation-based approaches. For instance, the ranking approach leverages benchmark metrics to select the most suitable method in distinct steps, e.g., batch correction. Therefore, ensuring a consistent and data-driven selection. On another hand, the aggregation approach uses multiple predictions to increase confidence levels, such as on CNV inference and malignant cell identification. These strategies minimize human intervention, ideal for beginner users, enabling rapid access to preliminary results and biological insights. Alternatively, the iterative mode allows intermediate users to define workflow breakpoints in a layered-based fashion. Users can pause, review results, and adjust decisions at stages (e.g., TME annotation), facilitating a "semi-supervised" approach for a more tailored analysis while retaining the SCRATCH framework. Thirdly, the custom mode enables precise executions based on modules for similar tasks (e.g., trajectory analysis and cell-cell communication), allowing experienced users to bypass the SCRATCH workflow and use it as an on-demand toolkit. This mode leverages pipeline parallelism for efficient processing, perfect for ongoing single-cell projects. SCRATCH produces HTML reports to ensure traceability and reproducibility. Conclusion: SCRATCH, an evolving project, comprises 05 subworkflows, 18 modules, and 25 tools. We envisage SCRATCH as an open-source tool and invite developers to leverage its modules for their pipelines. For more information, please visit https://break-through-cancer.github.io/btc-scrna-training. Citation Format: Andre F. Fonseca, Guangchun Han, Marcel Ribeiro-Dantas, Enyu Dai, Diljot Grewal, Matthew Zatzman, Eliyahu Havasov, Andrew McPherson, Break Through Cancer, Data Science TeamLab, Michael N. Noble, Rameen Beroukhim, Rachel Karchin, Sohrab P. Shah, Linghua Wang. Scratch: A highly modular pipeline for single-cell cancer research [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 863.
Read full abstract