Abstract Background: Homologous recombination deficiency (HRD) inhibits double strand breaks from being repaired in DNA, leading to cancer cells failing to recover themselves and cell death. Previous studies found that HRD-positive ovarian cancer patients had more significant clinical benefits from poly ADP-ribose polymerase inhibitors (PARPi) treatment. For accurate detection of HRD, next-generation sequencing (whole exome, whole genome) was used to identify large-scale genomic aberrations including telomeric allelic imbalance (HRD-TAI score), loss of heterozygosity profiles (HRD-LOH score), and large-scale state transitions (HRD-LST score). So far, many studies (HRDetect, SigMA, CHORD and shallowHRD) have been conducted to accurately determine HRD in a pan-cancer cohort. As the study progressed, it has been revealed that PARPi is effective when applied to patients with HRD in not only ovarian cancer but also breast cancer, prostate cancer, and pancreatic cancer. Purpose: To develop an algorithm that accurately predict HRD score in patients with ovarian cancer, breast cancer, prostate cancer, and pancreatic cancer utilizing a machine learning algorithm. Methods: We used whole-genome sequencing (WGS) data of 710 samples from 309 patients, whole-exome sequencing (WES) data of 4,650 samples from 2,193 patients from the pan-cancer cohort of the TCGA (TCGA-OV, TCGA-BRCA, TCGA-PRAD, and TCGA-PAAD). The HRD-TAI/HRD-LST/HRD-LOH scores were calculated through structural variation analyses. After data cleaning processes, machine-learning models were trained and tested on 228 out of 2,502 samples, and validation was performed on 1,248 out of 2,502 samples from the TCGA dataset. Results: To assess the performance of the machine-learning regressor, the concordance between predictions and annotations was quantified by calculating the R squared (𝑅;;;^2). As a result of training using machine learning algorithm, we achieved a high 𝑅;;;^2 (0.904) with a RMSE (root mean squared error) score (7.649) for a pan-cancer cohort. Conclusions: Our regressor was robust and accurate when applied to 4 cancer types. Using our systematic pan-cancer analysis, we found novel insights into the mechanisms of HRD across cancer types with potential contribution to clinical practice. Citation Format: Boram Choi, Youn Jin Choi. Next-generation sequencing-based regression algorithm to determine homologous recombination deficiency scores in a pan-cancer cohort [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2024; Part 1 (Regular Abstracts); 2024 Apr 5-10; San Diego, CA. Philadelphia (PA): AACR; Cancer Res 2024;84(6_Suppl):Abstract nr 2328.
Read full abstract