Abstract

BackgroundHigh throughput single-cell transcriptomic technology produces massive high-dimensional data, enabling high-resolution cell type definition and identification. To uncover the expressional patterns beneath the big data, a transcriptional landscape searching algorithm at a single-cell level is desirable.ResultsWe explored the feasibility of using DenseFly algorithm for cell searching on scRNA-seq data. DenseFly is a locality sensitive hashing algorithm inspired by the fruit fly olfactory system. The experiments indicate that DenseFly outperforms the baseline methods FlyHash and SimHash in classification tasks, and the performance is robust to dropout events and batch effects.ConclusionWe developed a method for mapping cells across scRNA-seq datasets based on the DenseFly algorithm. It can be an efficient tool for cell atlas searching.

Highlights

  • High throughput single-cell transcriptomic technology produces massive high-dimensional data, enabling high-resolution cell type definition and identification

  • The ongoing Human Cell Atlas (HCA) project is aiming to provide the profiles of all human cell types as a reference for future studies and is already

  • As the query and the reference cell profiles are a vast collection of gene expression vectors of very high dimensionality, the efficiency of traditional tree-based data searching methods will be challenged in time memory consumption

Read more

Summary

Introduction

High throughput single-cell transcriptomic technology produces massive high-dimensional data, enabling high-resolution cell type definition and identification. Single-cell RNA sequencing (scRNA-seq) technologies measure transcriptional profiles of individual cells, enabling high-resolution approaches for cell-type (subtype) definition and offering in-depth insights into cell-to-cell variations [1,2,3]. High-throughput scRNA-seq data is accumulating at massive scales [4]. The ongoing Human Cell Atlas (HCA) project is aiming to provide the profiles of all human cell types as a reference for future studies and is already. As the query and the reference cell profiles are a vast collection of gene expression vectors of very high dimensionality (e.g., up to ~ 10,000 gene expression features for millions of reference cells), the efficiency of traditional tree-based data searching methods will be challenged in time memory consumption. There have been several researches mapping/ searching cells across different datasets such as scmap [9], CellAtlasSearch [10] and comparisons [11] between methods are available

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call