Beyond benchmarking and towards predictive models of dataset-specific single-cell RNA-seq pipeline performance

Cindy Fang,Alina Selega,Kieran R Campbell

doi:10.1186/s13059-024-03304-9

Abstract

BackgroundThe advent of single-cell RNA-sequencing (scRNA-seq) has driven significant computational methods development for all steps in the scRNA-seq data analysis pipeline, including filtering, normalization, and clustering. The large number of methods and their resulting parameter combinations has created a combinatorial set of possible pipelines to analyze scRNA-seq data, which leads to the obvious question: which is best? Several benchmarking studies compare methods but frequently find variable performance depending on dataset and pipeline characteristics. Alternatively, the large number of scRNA-seq datasets along with advances in supervised machine learning raise a tantalizing possibility: could the optimal pipeline be predicted for a given dataset?ResultsHere, we begin to answer this question by applying 288 scRNA-seq analysis pipelines to 86 datasets and quantifying pipeline success via a range of measures evaluating cluster purity and biological plausibility. We build supervised machine learning models to predict pipeline success given a range of dataset and pipeline characteristics. We find that prediction performance is significantly better than random and that in many cases pipelines predicted to perform well provide clustering outputs similar to expert-annotated cell type labels. We identify characteristics of datasets that correlate with strong prediction performance that could guide when such prediction models may be useful.ConclusionsSupervised machine learning models have utility for recommending analysis pipelines and therefore the potential to alleviate the burden of choosing from the near-infinite number of possibilities. Different aspects of datasets influence the predictive performance of such models which will further guide users.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Beyond benchmarking and towards predictive models of dataset-specific single-cell RNA-seq pipeline performance

Abstract

Talk to us

Similar Papers

More From: Genome Biology

Lead the way for us

Journal: Genome Biology	Publication Date: Jun 17, 2024
License type: CC BY 4.0

Similar Papers

Mineral prospectivity mapping based on Support vector machine and Random Forest algorithm – A case study from Ashele copper–zinc deposit, Xinjiang, NW China
Chaojie Zheng ... Stefano Albanese
Ore Geology Reviews | VOL. 159
Chaojie Zheng, et. al.Chaojie Zheng ... Stefano Albanese
07 Jul 2023
Ore Geology Reviews | VOL. 159

Classification of suicidality by training supervised machine learning models with brain MRI findings: A systematic review
Mohammadamin Parsaei ... Giuseppe Delvecchio
Journal of Affective Disorders | VOL. 340
Mohammadamin Parsaei, et. al.Mohammadamin Parsaei ... Giuseppe Delvecchio
09 Aug 2023
Journal of Affective Disorders | VOL. 340

Machine learning to predict post-operative acute kidney injury stage 3 after heart transplantation
Tingyu Li ... Zhuo Li
BMC Cardiovascular Disorders | VOL. 22
Tingyu Li, et. al.Tingyu Li ... Zhuo Li
25 Jun 2022
BMC Cardiovascular Disorders | VOL. 22

A Prediction Model for Spot LNG Prices Based on Machine Learning Algorithms to Reduce Fluctuation Risks in Purchasing Prices
Sun-Feel Yang ... Eul-Bum Lee
Energies | VOL. 16
Sun-Feel Yang, et. al.Sun-Feel Yang ... Eul-Bum Lee
23 May 2023
Energies | VOL. 16

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Beyond benchmarking and towards predictive models of dataset-specific single-cell RNA-seq pipeline performance

Abstract

Talk to us

Similar Papers

More From: Genome Biology