Abstract

Rapid advances in single-cell genomics sequencing (SCGS) have allowed researchers to characterize tumor heterozygosity with unprecedented resolution and reveal the phylogenetic relationships between tumor cells or clones. However, high sequencing error rates of current SCGS data, i.e., false positives, false negatives, and missing bases, severely limit its application. Here, we present a deep learning framework, RDAClone, to recover genotype matrices from noisy data with an extended robust deep autoencoder, cluster cells into subclones by the Louvain-Jaccard method, and further infer evolutionary relationships between subclones by the minimum spanning tree. Studies on both simulated and real datasets demonstrate its robustness and superiority in data denoising, cell clustering, and evolutionary tree reconstruction, particularly for large datasets.

Highlights

  • Accepted: 22 November 2021Understanding the evolutionary mechanisms related to cancer progression and characterizing the intra-heterogeneity are promising routes for predicting and further controlling cancer progression, metastasis, and treatment responses [1,2,3,4,5]

  • The rapid development of single-cell genomics sequencing (SCGS) offers an unprecedented opportunity to profile the evolutionary relationship between subclones in cancer tissue [15,16,17]

  • The application of current SCGS data has been severely limited by high-level experimental noise from single cell isolation, whole genome amplification, genome interrogation, allelic dropout events inducing false negative (FN) and false positive (FP) mutations, missing bases resulting from the insufficient sequencing coverage, and doublets from the mistaken selection of more than one cell [14,15]

Read more

Summary

Introduction

Understanding the evolutionary mechanisms related to cancer progression and characterizing the intra-heterogeneity are promising routes for predicting and further controlling cancer progression, metastasis, and treatment responses [1,2,3,4,5]. SCG [22], a hierarchical Bayesian model, has been proposed to simultaneously cluster cells into subclusters and infer corresponding genotypes, but it cannot be used to infer the evolutionary relations between these subclones These three methods perform analysis tasks under the infinite site assumption, without considering recurrent mutations. BEAM [24] was developed to improve the quality in the SCGS data by using evolutionary information in the SCGS data in a molecular phylogenetic framework These methods are difficult to scale to large datasets, especially the probabilistic models with exponential time complexity. We applied the RDAClone method and other widely used methods to both simulated and real datasets, which demonstrated the superior performance of RDAClone compared to current state-of-the-art methods

RDAClone Model
Extended RDA
Identification of Subclones by Louvain-Jaccard Clustering
Construct Subclone Evolutionary Tree by Minimum Spanning Tree Method
Datasets and Preprocessing
Evaluation Metrics
Model Evaluation and Comparison on the Simulated Datasets
Methods
RDAClone Works Well on a Real scSNV Dataset with a High Missing Rate
Discussions
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.