Abstract
To establish a reliable machine learning model to predict malignancy in breast lesions identified by ultrasound (US) and optimize the negative predictive value to minimize unnecessary biopsies. We included clinical and ultrasonographic attributes from 1526 breast lesions classified as BI-RADS 3, 4a, 4b, 4c, 5, and 6 that underwent US-guided breast biopsy in four institutions. We selected the most informative attributes to train nine machine learning models, ensemble models and models with tuned threshold to make inferences about the diagnosis of BI-RADS 4a and 4b lesions (validation dataset). We tested the performance of the final model with 403 new suspicious lesions. The most informative attributes were shape, margin, orientation and size of the lesions, the resistance index of the internal vessel, the age of the patient and the presence of a palpable lump. The highest mean negative predictive value (NPV) was achieved with the K-Nearest Neighbors algorithm (97.9%). Making ensembles did not improve the performance. Tuning the threshold did improve the performance of the models and we chose the algorithm XGBoost with the tuned threshold as the final one. The tested performance of the final model was: NPV 98.1%, false negative 1.9%, positive predictive value 77.1%, false positive 22.9%. Applying this final model, we would have missed 2 of the 231 malignant lesions of the test dataset (0.8%). Machine learning can help physicians predict malignancy in suspicious breast lesions identified by the US. Our final model would be able to avoid 60.4% of the biopsies in benign lesions missing less than 1% of the cancer cases.
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.