Abstract

BackgroundWith the rapid development of single-cell RNA sequencing technology, it is possible to dissect cell-type composition at high resolution. A number of methods have been developed with the purpose to identify rare cell types. However, existing methods are still not scalable to large datasets, limiting their utility. To overcome this limitation, we present a new software package, called GiniClust3, which is an extension of GiniClust2 and significantly faster and memory-efficient than previous versions.ResultsUsing GiniClust3, it only takes about 7 h to identify both common and rare cell clusters from a dataset that contains more than one million cells. Cell type mapping and perturbation analyses show that GiniClust3 could robustly identify cell clusters.ConclusionsTaken together, these results suggest that GiniClust3 is a powerful tool to identify both common and rare cell population and can handle large dataset. GiniCluster3 is implemented in the open-source python package and available at https://github.com/rdong08/GiniClust3.

Highlights

  • With the rapid development of single-cell RNA sequencing technology, it is possible to dissect cell-type composition at high resolution

  • By using a real single-cell RNA-seq dataset as an example, we show that this new extension, which we call GiniClust3, can efficiently and accurately identify both common and rare cell types

  • By repeating this subsampling method for 10 times and applying GiniClust3 to the subsampled datasets, we found most of the clusters in subsampled datasets are consistent with the original ones, the median Normalized Mutual Information (NMI) is 0.81 (Fig. S1d)

Read more

Summary

Conclusions

With the technological development and protocol improvement, the scaling of singlecell RNA-seq is increasing in an exponential way [23], providing a great opportunity to identify previously unrecognized rare cell types. We have shown that GiniClust is an accurate and highly scalable method for detecting rare cell types from large single-cell RNA-seq datasets. GiniClust could identify both common and rare cell population and handle large dataset containing more than one million cells in an effective way. This property is important to comprehensively identify cell types in large datasets and may be useful for atlas datasets in future. Project home page: https://github.com/rdong08/GiniClust Operating system: Platform independent Programming language: python Other requirements: python 3.0 or higher License: GPL Any restrictions to use by non-academics: License needed

Background
Results
Methods
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.