Abstract

There is a strong need to eliminate batch-specific differences when integrating single-cell RNA-sequencing (scRNA-seq) datasets generated under different experimental conditions for downstream task analysis. Existing batch correction methods usually transform different batches of cells into one preselected “anchor” batch or a low-dimensional embedding space, and cannot take full advantage of useful information from multiple sources. We present a novel framework, called IMGG, i.e., integrating multiple single-cell datasets through connected graphs and generative adversarial networks (GAN) to eliminate nonbiological differences between different batches. Compared with current methods, IMGG shows excellent performance on a variety of evaluation metrics, and the IMGG-corrected gene expression data incorporate features from multiple batches, allowing for downstream tasks such as differential gene expression analysis.

Highlights

  • Connected Graphs and GenerativeThe maturation of single-cell RNA-sequencing technologies and the continuing decrease in sequencing costs have encouraged the establishment of large-scale projects such as the Human Cell Atlas, which generates transcriptomic data from thousands to millions of cells and almost inevitably involves multiple batches across time points, sequencing technologies, or experimental protocols [1,2]

  • The first is to select a batch as “anchor” and convert other batches to the “anchor” batch, e.g., mutual nearest neighbor pairs (MNNs) [6], iMAP [7], SCALEX [8], etc., which has the advantage that different batches of cells can be converted to one other so that gene expression can be studied under the same experimental conditions, and the disadvantage that it is not possible to fully combine the features of each batch and it is difficult to select an “anchor” batch because the cell types contained in each batch are unknown

  • The other is to transform all batches of data into a low-dimensional space to correct batch effects, e.g., Scanorama [9], Harmony [10], DESC [11], BBKNN [12], etc., which has the advantage of extracting biologically relevant latent features and reducing the impact of noise, and the disadvantage that it cannot be used for differential gene expression analysis

Read more

Summary

Introduction

Connected Graphs and GenerativeThe maturation of single-cell RNA-sequencing (scRNA-seq) technologies and the continuing decrease in sequencing costs have encouraged the establishment of large-scale projects such as the Human Cell Atlas, which generates transcriptomic data from thousands to millions of cells and almost inevitably involves multiple batches across time points, sequencing technologies, or experimental protocols [1,2]. The first is to select a batch as “anchor” and convert other batches to the “anchor” batch, e.g., MNN [6], iMAP [7], SCALEX [8], etc., which has the advantage that different batches of cells can be converted to one other so that gene expression can be studied under the same experimental conditions, and the disadvantage that it is not possible to fully combine the features of each batch and it is difficult to select an “anchor” batch because the cell types contained in each batch are unknown. The other is to transform all batches of data into a low-dimensional space to correct batch effects, e.g., Scanorama [9], Harmony [10], DESC [11], BBKNN [12], etc., which has the advantage of extracting biologically relevant latent features and reducing the impact of noise, and the disadvantage that it cannot be used for differential gene expression analysis

Methods
Results
Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.