Abstract

Scalable clustering algorithms are needed to analyze millions of cells in single cell RNA-seq (scRNA-seq) data. Here, we present an open source python package called FlowGrid that can integrate into the Scanpy workflow to perform clustering on very large scRNA-seq datasets. FlowGrid implements a fast density-based clustering algorithm originally designed for flow cytometry data analysis. We introduce a new automated parameter tuning procedure, and show that FlowGrid can achieve comparable clustering accuracy as state-of-the-art clustering algorithms but at a substantially reduced run time for very large single cell RNA-seq datasets. For example, FlowGrid can complete a one-hour clustering task for one million cells in about five min. https://github.com/holab-hku/FlowGrid. Supplementary data are available at Bioinformatics online.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.