Abstract

.Significance: Prostate cancer is the most common cancer among men. An accurate diagnosis of its severity at detection plays a major role in improving their survival. Recently, machine learning models using biomarkers identified from Raman micro-spectroscopy discriminated intraductal carcinoma of the prostate (IDC-P) from cancer tissue with a detection accuracy and differentiated high-grade prostatic intraepithelial neoplasia (HGPIN) from IDC-P with a accuracy.Aim: To improve the classification performance of machine learning models identifying different types of prostate cancer tissue using a new dimensional reduction technique.Approach: A radial basis function (RBF) kernel support vector machine (SVM) model was trained on Raman spectra of prostate tissue from a 272-patient cohort (Centre hospitalier de l’Université de Montréal, CHUM) and tested on two independent cohorts of 76 patients [University Health Network (UHN)] and 135 patients (Centre hospitalier universitaire de Québec-Université Laval, CHUQc-UL). Two types of engineered features were used. Individual intensity features, i.e., Raman signal intensity measured at particular wavelengths and novel Raman spectra fitted peak features consisting of peak heights and widths.Results: Combining engineered features improved classification performance for the three aforementioned classification tasks. The improvements for IDC-P/cancer classification for the UHN and CHUQc-UL testing sets in accuracy, sensitivity, specificity, and area under the curve (AUC) are (numbers in parenthesis are associated with the CHUQc-UL testing set): (), (), (6%), () with respect to the current best models. Discrimination between HGPIN and IDC-P was also improved in both testing cohorts: (), (), (), (). While no global improvements were obtained for the normal versus cancer classification task [ (), (), (), ()], the AUC was improved in both testing sets.Conclusions: Combining individual intensity features and novel Raman fitted peak features, improved the classification performance on two independent and multicenter testing sets in comparison to using only individual intensity features.

Highlights

  • Intraductal carcinoma of the prostate (IDC-P) is an aggressive variant of prostate cancer (PC) recognized as a distinct entity in 2016 by the World Health Organization classification.[1]

  • The improvements for IDC-P/cancer classification for the University Health Network (UHN) and CHUQc-UL testing sets in accuracy, sensitivity, specificity, and area under the curve (AUC) are: þ4% (þ8%), þ7% (þ9%), þ2% (6%), þ9 (þ9) with respect to the current best models

  • Discrimination between high-grade prostatic intraepithelial neoplasia (HGPIN) and IDC-P was improved in both testing cohorts: þ2.2% (þ1.7%), þ4.5% (þ3.6%), þ0% (þ0%), þ2.3 (þ0)

Read more

Summary

Introduction

Intraductal carcinoma of the prostate (IDC-P) is an aggressive variant of prostate cancer (PC) recognized as a distinct entity in 2016 by the World Health Organization classification.[1]. Most common techniques consist of using individual intensities and various complex feature selection methods, such as recursive feature elimination,[4] ant colony optimization,[5,6] and L0-SVM, or adaptive boosting,[7] to rank them and select the most relevant ones While some of these methods are much more powerful than others when only a few dozens of individual Raman intensities are considered, all methods provide a very similar classification accuracy when the number of retained features is higher than 50.8 Linear discriminant analysis accompanied with principal component analysis (PCA) is the most common dimensionality reduction technique used in Raman spectroscopy.[9,10,11,12,13,14] our group showed that Raman peak fitting features have better predictive performances than PCA for cancer/benign brain tissue classification.[15]

Objectives
Methods
Results
Discussion
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call