Abstract

Early detection of breast cancer and its correct stage determination are important for prognosis and rendering appropriate personalized clinical treatment to breast cancer patients. However, despite considerable efforts and progress, there is a need to identify the specific genomic factors responsible for, or accompanying Invasive Ductal Carcinoma (IDC) progression stages, which can aid the determination of the correct cancer stages. We have developed two-class machine-learning classification models to differentiate the early and late stages of IDC. The prediction models are trained with RNA-seq gene expression profiles representing different IDC stages of 610 patients, obtained from The Cancer Genome Atlas (TCGA). Different supervised learning algorithms were trained and evaluated with an enriched model learning, facilitated by different feature selection methods. We also developed a machine-learning classifier trained on the same datasets with training sets reduced data corresponding to IDC driver genes. Based on these two classifiers, we have developed a web-server Duct-BRCA-CSP to predict early stage from late stages of IDC based on input RNA-seq gene expression profiles. The analysis conducted by us also enables deeper insights into the stage-dependent molecular events accompanying IDC progression. The server is publicly available at http://bioinfo.icgeb.res.in/duct-BRCA-CSP.

Highlights

  • Breast cancer ranks second among all the cancer types arranged in the order of increasing death rates, the most prevalent cancer in women[1]

  • PET and MR imaging techniques are available for early detection of breast cancer, these techniques are based on morphological features that do not provide any clue for molecular events accompanying cancer progression

  • TNM classification overlaps with breast cancer stages, where T describes the extent of a primary tumour by the size or depth of invasion mainly in stage I or II, N describes the extent of regional lymph node metastasis in mainly stage II or III, and M describes the presence of metastasis mainly in stage IV23

Read more

Summary

Introduction

Breast cancer ranks second among all the cancer types arranged in the order of increasing death rates, the most prevalent cancer in women[1]. Gene expression based analyses are able to capture early stage markers and detect molecular events and pathways for driving disease from early to late stage Availability of such information can lead to identification of patients who would require targeted or personalized therapy. Potential treatment options are available based on clinical and pathological prognostic factors with the histological grade being the most important predictive factor[23] High throughput techniques such as Generation Sequencing (NGS) that capture expression of thousands of genes in a single assay can act as powerful analytical tools for capturing breast cancer prognostic signature[20]. Based on the most comprehensive ranking of gene features by various feature selection methods the top gene features were selected for enriched classifier training that helped us efficiently classify the tumours based on the tumour stage-specific gene expression profiles

Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call