Machine learning for RNA sequencing-based intrinsic subtyping of breast cancer

Silvia Cascianelli,Enzo Medico,Marco Masseroli,Claudio Isella,Ivan Molineris

doi:10.1038/s41598-020-70832-2

Abstract

Stratification of breast cancer (BC) into molecular subtypes by multigene expression assays is of demonstrated clinical utility. In principle, global RNA-sequencing (RNA-seq) should enable reconstructing existing transcriptional classifications of BC samples. Yet, it is not clear whether adaptation to RNA-seq of classifiers originally developed using PCR or microarrays, or reconstruction through machine learning (ML) is preferable. Hence, we focused on robustness and portability of PAM50, a nearest-centroid classifier developed on microarray data to identify five BC “intrinsic subtypes”. We found that standard PAM50 is profoundly affected by the composition of the sample cohort used for reference construction, and we propose a strategy, named AWCA, to mitigate this issue, improving classification robustness, with over 90% of concordance, and prognostic ability; we also show that AWCA-based PAM50 can even be applied as single-sample method. Furthermore, we explored five supervised learners to build robust, single-sample intrinsic subtype callers via RNA-seq. From our ML-based survey, regularized multiclass logistic regression (mLR) displayed the best performance, further increased by ad-hoc gene selection on the global transcriptome. On external test sets, mLR classifications reached 90% concordance with PAM50-based calls, without need of reference sample; mLR proven robustness and prognostic ability make it an equally valuable single-sample method to strengthen BC subtyping.

Highlights

Breast cancer (BC) is the most common cancer in women worldwide, and in about 80% of cases is invasive, i.e. it breaks through the walls of the glands or ducts where it originated and grows into surrounding breast tissue
Even if these groups firstly emerged by unsupervised hierarchical clustering on global microarray gene expression profiles[1], breast cancer (BC) classification into intrinsic subtypes is primarily achieved by measuring the expression of a set of only 50 genes, the so-called “PAM50 panel”[15]
At https://github.com/DEIB-GECO/BC_Intrinsic_subtyping we make publicly available the R codes to perform single-sample PAM50 classifications using precomputed average of within-class averages” (AWCA) references, and to build AWCA references on any expression data, even from other platforms, as we successfully experienced with microarray data from Affymetrix

Summary

Introduction

Breast cancer (BC) is the most common cancer in women worldwide, and in about 80% of cases is invasive, i.e. it breaks through the walls of the glands or ducts where it originated and grows into surrounding breast tissue. To explore in detail the potential of RNA-seq in reconstructing a BC classification system originally developed with a different technology, we considered the so-called “intrinsic molecular subtypes” (Luminal A, Luminal B, Normal-like, Her2-Enriched and Basal), which have become part of the common knowledge on the disease and are recognized as prognostically and therapeutically r elevant[7]. Even if these groups firstly emerged by unsupervised hierarchical clustering on global microarray gene expression profiles[1], BC classification into intrinsic subtypes is primarily achieved by measuring the expression of a set of only 50 genes, the so-called “PAM50 panel”[15]. Intrinsic subtypes summarize BC biological and molecular features, which are known to involve many more genes than the PAM50 set[20]

Methods

Results

Conclusion

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Journal: Scientific Reports	Publication Date: Aug 21, 2020
Citations: 38	License type: open-access

R Discovery Prime

R Discovery Prime

Machine learning for RNA sequencing-based intrinsic subtyping of breast cancer

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports

Lead the way for us

Similar Papers

Abstract IA22: Mammographic density: A risk factor for all breast cancers or only specific subtypes?
Celine Vachon ... Andrew Beck
Cancer Epidemiology, Biomarkers & Prevention | VOL. 25
Celine Vachon, et. al.Celine Vachon ... Andrew Beck
01 Mar 2016
Cancer Epidemiology, Biomarkers & Prevention | VOL. 25

Cancer Progress and Priorities: Breast Cancer.
Serena C Houghton ... Susan E Hankinson
Cancer epidemiology, biomarkers & prevention : a publication of the American Association for Cancer Research, cosponsored by the American Society of Preventive Oncology | VOL. 30
Serena C Houghton, et. al.Serena C Houghton ... Susan E Hankinson
01 May 2021
01 May 2021

Precision Oncology: An Ensembled Machine Learning Approach to Identify a Candidate mRNA Panel for Stratification of Patients with Breast Cancer.
Fırat Kurt ... Mustafa Agaoglu
Omics : a journal of integrative biology | VOL. 26
Fırat Kurt, et. al.Fırat Kurt ... Mustafa Agaoglu
30 Aug 2022
Omics : a journal of integrative biology | VOL. 26

Abstract P2-08-16: Prognostic and predictive abilities of intrinsic subtype in hormone receptor-positive metastatic breast cancer from the EGF30008 phase III clinical trial
A Prat ... Mf Press
Cancer Research | VOL. 76
A Prat, et. al.A Prat ... Mf Press
15 Feb 2016
Cancer Research | VOL. 76

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Machine learning for RNA sequencing-based intrinsic subtyping of breast cancer

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Scientific Reports