Abstract Cancer encompasses many different diseases, all typically involving an accumulation of genetic alterations. However, despite the genomic complexity of this group of diseases, cancer genomic studies typically rely on sequence alignment to a "reference genome" prior to analysis. Such approaches can provide gene expression quantification of known, annotated genes. However, they miss the detection of fusion genes, novel transcripts, virus insertions, and other genetic insults that may be present in cancer beyond an altered gene expression profile. Here, we develop a workflow for de novo identification and quantification of "outside of reference genome" genetic aberrations. This workflow leverages klue, a method we developed that is based on the compact de Bruijn graph, which can extract sequences uniquely found in a "tumor" sample but not in a "normal" control sample from genomic sequencing data. We first applied this method to an RNA-seq experiment from the EμSRα-tTA/TetO-MYC mouse model, which is a MYC-driven autochthonous transgenic mouse model of T-cell acute lymphoblastic lymphoma (T-ALL). We extracted k-mer contigs present in the mouse tumor samples but are present neither in the spleens of wild-type mice of the same genetic background nor in the standard mouse reference genome assembly. We then mapped RNA-seq reads from the T-ALL mouse tumor samples to these contigs. The contigs with the highest number of mapped reads (i.e. the highest counts) were run through BLAST to determine their identities. The mapped contigs with the highest counts were the tTA transgenic element sequence and the human MYC transgene. This result showcases this workflow's success in isolating and quantifying "reference-absent” tumor-unique sequences. Next, we applied klue to whole genome sequencing (WGS), whole exome sequencing (WES), and RNA-seq data from a patient with B-cell follicular lymphoma (with matched normal specimen) from the Texas Cancer Research Biobank, as this data is unrestricted open access. Tumor-unique contigs in sequencing reads were extracted and analyzed with BLAST and BLAT. In one case, a nonsynonymous point mutation in the SPI1 (also known as PU.1) oncogene was found. The sequencing reads from the tumor sample showed both the "normal variant" of SPI1 as well as the mutant SPI1:H268P, suggesting a heterozygous mutation. Altogether, these results suggest that the idea of first identifying tumor-unique sequences then doing post hoc analysis of those sequences is a viable approach for de novo mutation discovery in cancer genomics. Citation Format: Delaney K. Sullivan, Eissa Albinali, Mayuko Boffelli, Lior Pachter. De novo mutation discovery in a mouse model and a human patient sample of non-Hodgkin's lymphoma [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 6246.
Read full abstract