Optical mammography as a promising tool for cancer diagnosis has largely fallen behind expectations. Modern machine learning (ML) methods offer ways to improve cancer detection in diffuse optical transmission data. We aim to quantitatively evaluate the classification of cancer-positive versus cancer-negative patients using ML methods on raw transmission time series data from bilateral breast scans during subjects' rest. We use a support vector machine (SVM) with hyperparameter optimization and cross-validation to systematically explore a range of data preprocessing and feature-generation strategies. We also apply an automated ML (AutoML) framework to validate our findings. We use receiver operating characteristics and the corresponding area under the curve (AUC) to quantify classification performance. For the sample group available ( , 18 cancer patients), we demonstrate an AUC score of up to 93.3% for SVM classification and up to 95.0% for the AutoML classifier. ML offers a viable strategy for clinically relevant breast cancer diagnosis using diffuse-optical transmission measurements. The diagnostic performance of ML on raw data can outperform traditional statistical biomarkers derived from reconstructed image time series. To achieve clinically relevant performance, our ML approach requires simultaneous bilateral scanning of the breasts with spatially dense channel coverage.
Read full abstract