Reference-free inference of tumor phylogenies from single-cell sequencing data

Ayshwarya Subramanian,Russell Schwartz

doi:10.1109/iccabs.2014.6863944

Abstract

Effective management and treatment of cancer is greatly complicated by the rapid evolution and resulting heterogeneity of tumors. In prior work, we showed that phylogenetic study of cell populations in single tumors provides a way to make sense of this heterogeneity and identify robust features of evolutionary processes of single tumors. The introduction of single-cell sequencing has shown great promise for advancing single-tumor phylogenetics, but the volume and high noise of these data present many challenges for studying tumor evolution, especially with regard to the chromosome abnormalities that typically dominate tumor evolution. We propose a reference-free approach to mining genome sequence reads to allow predictive classification of tumors into heterogeneous types and reconstruct models of their evolution. The approach extracts k-mer counts from single-cell tumor sequences, using differences in normalized k-mer frequencies as a proxy for overall evolutionary distance between distinct cells. The approach is computationally more efficient in time and space than standard protocols for deriving phylogenetic markers, which rely on first aligning sequence reads to a reference genome and then processing the data downstream to extract meaningful progression markers and use them to construct phylogenetic trees. The approach also provides a way to bypass some of the challenges that massive genome rearrangement typical of tumor genomes present for reference-based methods. To handle the unique challenges of single-cell sequencing data, we have applied a series of noise correction measures intended to account for biases due to the sequencing technology. We illustrate the method using publicly available tumor single cell sequencing data. Phylogenies built from these k-mer spectrum distance matrices yield splits that are statistically significant when tested for their ability to partition cells at different stages of cancer.

Highlights

Effective management and treatment of cancer continues to be complicated by the rapid evolution and resulting heterogeneity of tumors
We conclude that our method offers advantages of (1) a stand-alone reference-free approach (2) easy comparison of multiple tumor genomes and (3) translating tumor copy number variation into practical use as features for classification and evolutionary inference
In the absence of this technical limitation, we propose an alternate procedure of unsupervised filtering [27] based on minimum variance or interquartile range (IQR) on normalized data

Summary

Introduction

Effective management and treatment of cancer continues to be complicated by the rapid evolution and resulting heterogeneity of tumors. Tumors at the same site of origin and identical clinical presentation may show wide differences in genomic [3] and functional [4] architectures, leading to a Recent research [8,9] has shown that driver mutations function through clones of heterogeneous cells which progress through space and time The search for such clonal populations defining robust progression subtypes and the paths leading to them remains an active area of research. Single-cell sequencing, results in large noisy datasets, which are computationally expensive to analyze, and extending prior methods requires customizing them at different levels to accommodate new noise models With such massive scale data becoming rapidly available, there is a need for new strategies to exploit them to derive robust progression models

Methods

Results

Conclusion