Abstract

High-throughput single-cell sequencing technologies hold tremendous potential for defining cell types in an unbiased fashion using gene expression and epigenomic state. A key challenge in realizing this potential is integrating single-cell datasets from multiple protocols, biological contexts, and data modalities into a joint definition of cellular identity. We previously developed an approach, called linked inference of genomic experimental relationships (LIGER), that uses integrative nonnegative matrix factorization to address this challenge. Here, we provide a step-by-step protocol for using LIGER to jointly define cell types from multiple single-cell datasets. The main stages of the protocol are data preprocessing and normalization, joint factorization, quantile normalization and joint clustering, and visualization. We describe how to jointly define cell types from single-cell RNA-seq (scRNA-seq) and single-nucleus ATAC-seq (snATAC-seq) data, but similar steps apply across a wide range of other settings and data types, including cross-species analysis, single-nucleus DNA methylation, and spatial transcriptomics. Our protocol contains examples of expected results, describes common pitfalls, and relies only on our freely available, open-source R implementation of LIGER. We also provide R Markdown tutorials showing the outputs from each individual code segment. The analysis process can be performed in 1-4 h, depending on dataset size, and assumes no specialized bioinformatics training.

Highlights

  • Identifying the molecular features that define the types and functions of individual cells provides a tremendous opportunity for understanding the genomic blueprint of the human body

  • A variety of high-throughput single-cell sequencing technologies have emerged, measuring the gene expression, DNA methylation, and chromatin accessibility of individual cells. These data modalities together enable researchers to revisit the conventional classifications of cell types and states in a quantitative, systematic, unbiased fashion

  • Through the analysis of scRNA-seq data from bed nucleus of the stria terminalis (BNST), we found significant sexual dimorphism in the gene expression patterns of multiple cell types

Read more

Summary

Introduction

Identifying the molecular features that define the types and functions of individual cells provides a tremendous opportunity for understanding the genomic blueprint of the human body. This joint analysis aided in the interpretation of populations difficult to identify from methylation alone and increased our sensitivity for detecting rare cell types. In a cross-species analysis of mouse and human brain cells, both mouse and human cells should be extracted using the same protocols; if whole-cell transcriptomes are extracted from the mouse cells, but only nuclear RNA is extracted from the human cells, the biological variable (species) is confounded by a technical variable (whole-cell versus nuclear extraction protocol) It may still be possible, using LIGER, to identify shared cell types and gene expression signatures using such a batch design, but disentangling biological from technical differences will be challenging. The choice of library preparation protocol may influence the decision about cells vs. reads, because protocols (such as SMART-seq) that sample from all positions within a transcript benefit more from increased coverage than poly(A) priming protocols that capture only the ends of transcripts

Limitations
Procedure
KLHL17
Anticipated Results and Troubleshooting
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.