Abstract

BackgroundEffective management and treatment of cancer continues to be complicated by the rapid evolution and resulting heterogeneity of tumors. Phylogenetic study of cell populations in single tumors provides a way to delineate intra-tumoral heterogeneity and identify robust features of evolutionary processes. The introduction of single-cell sequencing has shown great promise for advancing single-tumor phylogenetics; however, the volume and high noise in these data present challenges for inference, especially with regard to chromosome abnormalities that typically dominate tumor evolution. Here, we investigate a strategy to use such data to track differences in tumor cell genomic content during progression.ResultsWe propose a reference-free approach to mining single-cell genome sequence reads to allow predictive classification of tumors into heterogeneous cell types and reconstruct models of their evolution. The approach extracts k-mer counts from single-cell tumor genomic DNA sequences, and uses differences in normalized k-mer frequencies as a proxy for overall evolutionary distance between distinct cells. The approach computationally simplifies deriving phylogenetic markers, which normally relies on first aligning sequence reads to a reference genome and then processing the data to extract meaningful progression markers for constructing phylogenetic trees. The approach also provides a way to bypass some of the challenges that massive genome rearrangement typical of tumor genomes presents for reference-based methods. We illustrate the method on a publicly available breast tumor single-cell sequencing dataset.ConclusionsWe have demonstrated a computational approach for learning tumor progression from single cell sequencing data using k-mer counts. k-mer features classify tumor cells by stage of progression with high accuracy. Phylogenies built from these k-mer spectrum distance matrices yield splits that are statistically significant when tested for their ability to partition cells at different stages of cancer.

Highlights

  • Effective management and treatment of cancer continues to be complicated by the rapid evolution and resulting heterogeneity of tumors

  • We conclude that our method offers advantages of (1) a stand-alone reference-free approach (2) easy comparison of multiple tumor genomes and (3) translating tumor copy number variation into practical use as features for classification and evolutionary inference

  • In the absence of this technical limitation, we propose an alternate procedure of unsupervised filtering [27] based on minimum variance or interquartile range (IQR) on normalized data

Read more

Summary

Introduction

Effective management and treatment of cancer continues to be complicated by the rapid evolution and resulting heterogeneity of tumors. Tumors at the same site of origin and identical clinical presentation may show wide differences in genomic [3] and functional [4] architectures, leading to a Recent research [8,9] has shown that driver mutations function through clones of heterogeneous cells which progress through space and time The search for such clonal populations defining robust progression subtypes and the paths leading to them remains an active area of research. Single-cell sequencing, results in large noisy datasets, which are computationally expensive to analyze, and extending prior methods requires customizing them at different levels to accommodate new noise models With such massive scale data becoming rapidly available, there is a need for new strategies to exploit them to derive robust progression models

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call