Abstract

MotivationSingle-cell RNA sequencing (scRNA-seq) measures gene expression at the resolution of individual cells. Massively multiplexed single-cell profiling has enabled large-scale transcriptional analyses of thousands of cells in complex tissues. In most cases, the true identity of individual cells is unknown and needs to be inferred from the transcriptomic data. Existing methods typically cluster (group) cells based on similarities of their gene expression profiles and assign the same identity to all cells within each cluster using the averaged expression levels. However, scRNA-seq experiments typically produce low-coverage sequencing data for each cell, which hinders the clustering process.ResultsWe introduce scMatch, which directly annotates single cells by identifying their closest match in large reference datasets. We used this strategy to annotate various single-cell datasets and evaluated the impacts of sequencing depth, similarity metric and reference datasets. We found that scMatch can rapidly and robustly annotate single cells with comparable accuracy to another recent cell annotation tool (SingleR), but that it is quicker and can handle larger reference datasets. We demonstrate how scMatch can handle large customized reference gene expression profiles that combine data from multiple sources, thus empowering researchers to identify cell populations in any complex tissue with the desired precision.Availability and implementationscMatch (Python code) and the FANTOM5 reference dataset are freely available to the research community here https://github.com/forrest-lab/scMatch.Supplementary information Supplementary data are available at Bioinformatics online.

Highlights

  • The whole-transcriptome analysis of single cells has been possible since 2009 (Tang et al, 2009) only recently has it become broadly applied in the research community

  • Barcode-based tracking methods allow us to profile gene expression in thousands of cells. This advance in single cell profiling is enabling characterization of the diverse cell types that make up various tissues (Regev et al, 2018) and to study biological processes, such as cell development (Bendall et al, 2014; Klein et al, 2015; Setty et al, 2016; Trapnell et al, 2014), cell state transition and multi-cellular interactions (Tay et al, 2010; Thompson et al, 2014; Wang et al, 2014)

  • We show that the choice of correlation measure, the sequencing depth, the cell types in question and the reference database all have an impact on the annotation accuracy

Read more

Summary

Introduction

The whole-transcriptome analysis of single cells has been possible since 2009 (Tang et al, 2009) only recently has it become broadly applied in the research community This is due to the development of new massively multiplexed single-cell RNA sequencing (scRNA-seq) protocols (Han et al, 2018; Hashimshony et al, 2012; Macosko et al, 2015; Picelli et al, 2013; Rosenberg et al, 2018) and the broad availability of commercial platforms for generating these libraries. Barcode-based tracking methods (molecular-, cellular- and plate-level tags) allow us to profile gene expression in thousands of cells This advance in single cell profiling is enabling characterization of the diverse cell types that make up various tissues (Regev et al, 2018) and to study biological processes, such as cell development (Bendall et al, 2014; Klein et al, 2015; Setty et al, 2016; Trapnell et al, 2014), cell state transition (da Rocha et al, 2018; Haghverdi et al, 2016; Shin et al, 2015; Treutlein et al, 2014) and multi-cellular interactions (Tay et al, 2010; Thompson et al, 2014; Wang et al, 2014).

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.