Abstract

We present pipeComp (https://github.com/plger/pipeComp), a flexible R framework for pipeline comparison handling interactions between analysis steps and relying on multi-level evaluation metrics. We apply it to the benchmark of single-cell RNA-sequencing analysis pipelines using simulated and real datasets with known cell identities, covering common methods of filtering, doublet detection, normalization, feature selection, denoising, dimensionality reduction, and clustering. pipeComp can easily integrate any other step, tool, or evaluation metric, allowing extensible benchmarks and easy applications to other fields, as we demonstrate through a study of the impact of removal of unwanted variation on differential expression analysis.

Highlights

  • Single-cell RNA-sequencing and the set of attached analysis methods are evolving fast, with more than 560 software tools available to the community [1], roughly half of which are dedicated to tasks related to data processing such as clustering, ordering, dimension reduction, or normalization

  • Optional benchmark functions can be set for each step to provide standardized, multi-layered evaluation metrics

  • We investigated the impact of several filtering methods: four methods based on deviations to median absolute deviations (MADs) with increasing levels of stringency and two methods based on scater’s runPCA using all or selected covariates

Read more

Summary

Introduction

Single-cell RNA-sequencing (scRNAseq) and the set of attached analysis methods are evolving fast, with more than 560 software tools available to the community [1], roughly half of which are dedicated to tasks related to data processing such as clustering, ordering, dimension reduction, or normalization. A number of good comparison and benchmark studies have already been performed on various steps related to scRNAseq processing and analysis and can guide the choice of methodology [3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21] These recommendations need constant updating and Germain et al Genome Biology (2020) 21:227 often leave open many details of an analysis. It is critical to evaluate the single effect of a preprocessing method and its positive or negative interaction with all parts of a workflow

Methods
Results
Discussion
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.