Abstract

Ductal carcinoma in situ (DCIS) is a preinvasive form of breast cancer with a highly variable potential of becoming invasive and affecting mortality of the patients. Due to the lack of accurate markers of disease progression, many women with detected DCIS are currently overtreated. To distinguish those DCIS cases who are likely to require therapy from those who should be left untreated, there is a need for robust and predictive biomarkers extracted from molecular or genetic profiles. We developed a supervised machine learning approach that implements multi-omics feature selection and model regularization for the identification of biomarker combinations that could be used to distinguish low-risk DCIS lesions from those with a higher likelihood of progression. To investigate the genetic heterogeneity of disease progression, we applied this approach to 40 pure DCIS and 259 invasive breast cancer (IBC) samples profiled with genome-wide transcriptomics, DNA methylation, and DNA copy number variation. Feature selection using the multi-omics Lasso-regularized algorithm identified both known genes involved in breast cancer development, as well as novel markers for early detection. Even though the gene expression-based model features led to the highest classification accuracy alone, methylation data provided a complementary source of features and improved especially the sensitivity of correctly classifying DCIS cases. We also identified a number of repeatedly misclassified DCIS cases when using either the expression or methylation markers. A small panel of 10 gene markers was able to distinguish DCIS and IBC cases with high accuracy in nested cross-validation (AU-ROC = 0.99). The marker panel was not specific to any of the established breast cancer subtypes, suggesting that the 10-gene signature may provide a subtype-agnostic and cost-effective approach for breast cancer detection and patient stratification. We further confirmed high accuracy of the 10-gene signature in an external validation cohort (AU-ROC = 0.95), profiled using distinct transcriptomic assay, hence demonstrating robustness of the risk signature.

Highlights

  • Ductal carcinoma in situ (DCIS) is a non-invasive precursor to invasive breast cancer (IBC) with low risk of progression (Cowell et al, 2013)

  • DNA copy number variation profiles showed the poorest performance among the three omics datasets, even though the Lasso model selected the largest number of copy number features, suggesting that copy number changes do not contain a sufficient predictive signal for the classification between DCIS and IBC cases

  • In our multi-omics classification analysis between DCIS and IBC, we found that the gene expression-based model features led to the highest classification accuracy alone; methylation data provided a complementary source of predictive signal, and it improved especially the sensitivity of correctly classifying DCIS cases, which is important for clinical application of risk signatures

Read more

Summary

Introduction

Ductal carcinoma in situ (DCIS) is a non-invasive precursor to invasive breast cancer (IBC) with low risk of progression (Cowell et al, 2013). Recent advances in breast cancer screening have resulted in an increasing number of women with detected DCIS lesions (Virnig et al, 2010; Seely and Alhassan, 2018; van Seijen et al, 2019), many of which will never progress to invasive disease (Page et al, 1982, 1995; Nielsen et al, 1984; Collins et al, 2005; Sanders et al, 2005). The diagnostic classification has considerable uncertainty, and the DCIS lesions may vary from indolent lesions to tumors on the verge of becoming invasive (Gorringe and Fox, 2017). Due to this uncertainty, treatment for DCIS is often extensive, resulting in substantial overtreatment (Esserman et al, 2014; Groen et al, 2017)

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call