Introduction: We present YADA, a cellular content deconvolution algorithm for estimating cell type proportions in heterogeneous cell mixtures based on gene expression data. YADA utilizes curated gene signatures of cell type-specific marker genes, either obtained intrinsically from pure cell type expression matrices or provided by the user. Method: YADA implements an accessible and extensible deconvolution framework uniquely capable of handling marker genes alone as inputs. Adoption barriers are lowered significantly by relying solely on literature-supported cell type-specific signatures rather than full transcriptomic profiles from purified isolates. However, flexible inputs do not necessitate sacrificing rigor - predictions match metrics of current methodologies through an integrated optimization scheme balancing multiple inference algorithms. Efficiency optimizations via compiled runtimes enable rapid execution. Packaging as an importable Python toolkit promotes community enhancement while retaining codebase extensibility. Result: Validation studies demonstrate that YADA matches or exceeds the performance of current deconvolution methods on benchmark datasets. To demonstrate the utility and enable immediate usage, we provide an online Jupyter Notebook implementation coupled with tutorials. Conclusion: YADA provides an accurate, efficient, and extensible Python-based toolkit for cellular deconvolution analysis of heterogeneous gene expression data.
Read full abstract