Abstract

Abstract Chronic lymphocytic leukemia (CLL) is a hematological malignancy where malignant B-cells arise in the bone marrow and circulate in the blood. The disease course in CLL is heterogeneous. Some patients require immediate, aggressive treatment, while others do not require treatment for many years. There are a variety of prognostic and predictive markers for CLL, and research for new markers is ongoing. Our goal is to define multiple, informative, transcriptome variables for use in epidemiology and clinical studies, providing an agnostic framework to represent the many sources of heterogeneity that exist across CLL patients. Data cleaning and normalization were designed with parametric modeling in mind. After pre-processing, principal component analysis (PCA) was used to define orthogonal, quantitative components, referred to as spectra, to parameterize the transcriptome space. Each patient receives a set of quantitative values; one for each of the variables. Each of these variables is a multi-gene expression biomarker. Bulk RNA-sequencing was performed on treatment naïve CD19+/CD5+ sorted B-cells on the HiSeq4000 or NovaSeq platforms. Transcript-based read counts were generated from FASTQ files using Salmon. High-quality genes were selected, read counts internally normalized, and corrected for batch effects using ComBat. Pre-processing resulted in a final set of 8,895 quality-controlled, autosomal, protein-coding genes. PCA resulted in 13 spectra representing 55.7% of the total variance across all 202 CLL transcriptomes. To assess how well our novel CLL spectra framework captured known molecular marks for prognosis, we investigated associations between spectra with IGHV mutational status (determined using MiXCR) and CD49d expression. In multivariable analysis, the model including all spectra significantly predicted IGHV mutational status (p<2.0x10-20). A highly significant model was also found for quantitative CD49d expression (which was not a gene in the framework, p<7.4x10-32). Using matched germline DNA and tumor DNA sequencing we identified somatic CNVs and mutations using GATK and Strelka. We then assessed how well our novel framework captured these DNA characteristics. Significant spectra-based models were found that predicted common CLL CNVs (11q23 del, 13q14 del, 17p13 del and trisomy 12) as well as a complex karyotype (>2 large CNVs) phenotype. Presence of ATM, NOTCH1, and TP53 protein-altering mutations was also independently captured in the framework. An agnostic framework of quantitative spectra (transcriptome variables) was able to identify known expression-based and genomic tumor features. This indicates that spectra provide a flexible intrinsic framework to represent tumor characteristics. Spectra are independent and designed to be used as predictor variables, alongside other covariates, in outcome modeling and have the potential to improve both epidemiology and clinical studies. Citation Format: Julie Ellen Feusier, Rosalie G. Waller, Michael J. Madsen, Brian Avery, Nicola J. Camp. Novel transcriptomic framework captures prognostic and predictive markers in CLL [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2021; 2021 Apr 10-15 and May 17-21. Philadelphia (PA): AACR; Cancer Res 2021;81(13_Suppl):Abstract nr 263.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call