Abstract

Simple SummaryBreast cancer is the second most common diagnosed malignancy in women worldwide. In this study, we examine the feasibility of breast tumor characterization based on [18F]FDG-PET/CT images using machine learning (ML) approaches in combination with data-preprocessing techniques. ML prediction models for breast cancer detection and the identification of breast cancer receptor status, proliferation rate, and molecular subtypes were established and evaluated. Furthermore, the importance of most repeatable features was investigated. Results displayed high performance of malignant/benign tumor differentiation and triple negative tumor subtype ML models. We observed high repeatability of radiomic features for both high performing predictive models.Background: This study investigated the performance of ensemble learning holomic models for the detection of breast cancer, receptor status, proliferation rate, and molecular subtypes from [18F]FDG-PET/CT images with and without incorporating data pre-processing algorithms. Additionally, machine learning (ML) models were compared with conventional data analysis using standard uptake value lesion classification. Methods: A cohort of 170 patients with 173 breast cancer tumors (132 malignant, 38 benign) was examined with [18F]FDG-PET/CT. Breast tumors were segmented and radiomic features were extracted following the imaging biomarker standardization initiative (IBSI) guidelines combined with optimized feature extraction. Ensemble learning including five supervised ML algorithms was utilized in a 100-fold Monte Carlo (MC) cross-validation scheme. Data pre-processing methods were incorporated prior to machine learning, including outlier and borderline noisy sample detection, feature selection, and class imbalance correction. Feature importance in each model was assessed by calculating feature occurrence by the R-squared method across MC folds. Results: Cross validation demonstrated high performance of the cancer detection model (80% sensitivity, 78% specificity, 80% accuracy, 0.81 area under the curve (AUC)), and of the triple negative tumor identification model (85% sensitivity, 78% specificity, 82% accuracy, 0.82 AUC). The individual receptor status and luminal A/B subtype models yielded low performance (0.46–0.68 AUC). SUVmax model yielded 0.76 AUC in cancer detection and 0.70 AUC in predicting triple negative subtype. Conclusions: Predictive models based on [18F]FDG-PET/CT images in combination with advanced data pre-processing steps aid in breast cancer diagnosis and in ML-based prediction of the aggressive triple negative breast cancer subtype.

Highlights

  • Breast cancer is the most common cancer in females, with over two million cases per year [1]

  • The objectives of this study are: (a) to establish prediction models for breast cancer detection and the identification of breast cancer receptor status, proliferation rate, and molecular subtypes from [18F]FDG-positron emission tomography (PET)/CT images with machine learning (ML), (b) to investigate the effect of data pre-processing on breast tumor characterization ML models, and (c), to compare ML-based prediction models with conventional standardized uptake value (SUV)-based approaches

  • This study investigated the performance of ML predictive models based on [1188FF]]FFDDGG-PET//CCTTMMLLaannaalylysissisoof f171373brberaesatstutmumorosrsinin171070paptaietinetnstws withithanadnwd iwthiothuotudtadtaatpareppreapraa-triaotnio. nO.uOr ustrusdtuydsyhoswhoswthsatthdaat tdaaptarep-prer-opcreoscsiensgsincognctorinbturitbesuteosmtoodmeol dpeelrfpoerrmfoarnmceanocfethoef bthreabstrecaasntccearndceetrecdteiotenctMioLnmMoLdeml o(8d0e%l

Read more

Summary

Introduction

Breast cancer is the most common cancer in females, with over two million cases per year [1]. In breast cancer treatment assessment of receptor status (estrogen (ER), progesterone (PR) and Her2-neu receptor (HER2)) by immunohistochemistry (IHC) from breast biopsy is used for tumor subtype classification. This study investigated the performance of ensemble learning holomic models for the detection of breast cancer, receptor status, proliferation rate, and molecular subtypes from [18F]FDG-PET/CT images with and without incorporating data pre-processing algorithms. Data pre-processing methods were incorporated prior to machine learning, including outlier and borderline noisy sample detection, feature selection, and class imbalance correction. Results: Cross validation demonstrated high performance of the cancer detection model (80% sensitivity, 78% specificity, 80% accuracy, 0.81 area under the curve (AUC)), and of the triple negative tumor identification model (85% sensitivity, 78% specificity, 82% accuracy, 0.82 AUC). SUVmax model yielded 0.76 AUC in cancer detection and 0.70 AUC in predicting triple negative subtype

Objectives
Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call