Abstract
MotivationWith an increasing number of patient-derived xenograft (PDX) models being created and subsequently sequenced to study tumor heterogeneity and to guide therapy decisions, there is a similarly increasing need for methods to separate reads originating from the graft (human) tumor and reads originating from the host species’ (mouse) surrounding tissue. Two kinds of methods are in use: On the one hand, alignment-based tools require that reads are mapped and aligned (by an external mapper/aligner) to the host and graft genomes separately first; the tool itself then processes the resulting alignments and quality metrics (typically BAM files) to assign each read or read pair. On the other hand, alignment-free tools work directly on the raw read data (typically FASTQ files). Recent studies compare different approaches and tools, with varying results.ResultsWe show that alignment-free methods for xenograft sorting are superior concerning CPU time usage and equivalent in accuracy. We improve upon the state of the art sorting by presenting a fast lightweight approach based on three-way bucketed quotiented Cuckoo hashing. Our hash table requires memory comparable to an FM index typically used for read alignment and less than other alignment-free approaches. It allows extremely fast lookups and uses less CPU time than other alignment-free methods and alignment-based methods at similar accuracy. Several engineering steps (e.g., shortcuts for unsuccessful lookups, software prefetching) improve the performance even further.AvailabilityOur software xengsort is available under the MIT license at http://gitlab.com/genomeinformatics/xengsort. It is written in numba-compiled Python and comes with sample Snakemake workflows for hash table construction and dataset processing.
Highlights
To learn about tumor heterogeneity and tumor progression under realistic in vivo conditions, but without putting human life at risk, one can implant human tumor tissue into a mouse and study its evolution
We show that alignment-free methods for xenograft sorting are superior concerning CPU time usage and equivalent in accuracy
We improve upon the state of the art sorting by presenting a fast lightweight approach based on three-way bucketed quotiented Cuckoo hashing
Summary
To learn about tumor heterogeneity and tumor progression under realistic in vivo conditions, but without putting human life at risk, one can implant human tumor tissue into a mouse and study its evolution This is called a (patient-derived) xenograft (PDX). Several samples of the (graft/human) tumor and surrounding (host/mouse) tissue are taken and subjected to exome or whole genome sequencing in order to monitor the changing genomic features of the tumor. This information can Several tools have been developed for xenograft sorting, motivated by different goals and using different approaches; a summary appears below.
Published Version (Free)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have