Abstract
Single cell RNA-seq (scRNAseq) workflows typically start with a count matrix and end with the clustering of sampled cells. While a range of methods have been developed to cluster scRNAseq datasets, no theoretical tools exist to explain why a particular cluster exists or why a hypothesized cluster is missing. Recently, several authors have shown that eigenvalues of scRNAseq count matrices can be approximated using random matrix models. In this work, we extend these previous works to the study of a scRNAseq workflow. We model scaled count matrices using random matrices with normally distributed entries. Using these random matrix models, we quantify the differential expression of a cluster and develop predictions for the workflow, and in particular clustering, as a function of the differential expression. We also use results from random matrix theory (RMT) to develop predictive formulas for portions of the scRNAseq workflow. Using simulated and real datasets, we show that our predictions are accurate if certain conditions hold on differential expression, with our RMT based predictions requiring particularly stringent condition. We find that real datasets violate these conditions, leading to bias in our predictions, but our predictions are better than a naive estimator and we point out future work that can improve the predictions. To our knowledge, our formulas represents the first predictive results for scRNAseq workflows.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.