Abstract

Accounting for cell type compositions has been very successful at analyzing high-throughput data from heterogeneous tissues. Differential gene expression analysis at cell type level is becoming increasingly popular, yielding biomarker discovery in a finer granularity within a particular cell type. Although several computational methods have been developed to identify cell type-specific differentially expressed genes (csDEG) from RNA-seq data, a systematic evaluation is yet to be performed. Here, we thoroughly benchmark six recently published methods: CellDMC, CARseq, TOAST, LRCDE, CeDAR and TCA, together with two classical methods, csSAM and DESeq2, for a comprehensive comparison. We aim to systematically evaluate the performance of popular csDEG detection methods and provide guidance to researchers. In simulation studies, we benchmark available methods under various scenarios of baseline expression levels, sample sizes, cell type compositions, expression level alterations, technical noises and biological dispersions. Real data analyses of three large datasets on inflammatory bowel disease, lung cancer and autism provide evaluation in both the gene level and the pathway level. We find that csDEG calling is strongly affected by effect size, baseline expression level and cell type compositions. Results imply that csDEG discovery is a challenging task itself, with room to improvements on handling low signal-to-noise ratio and low expression genes.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call