Evaluation of the performance of traditional machine learning algorithms, convolutional neural network and AutoML Vision in ultrasound breast lesions classification: a comparative study.

Ka Wing Wan,Chun Hoi Wong,Hoi Ying Fong,Dejian Fan,Ho Fung Ip,Pak Leung Yuen,Michael Ying

doi:10.21037/qims-20-922

Ka Wing Wan, Chun Hoi Wong + Show 5 more

Open Access

https://doi.org/10.21037/qims-20-922

Copy DOI

Journal: Quantitative Imaging in Medicine and Surgery	Publication Date: Apr 1, 2021
Citations: 45	License type: cc-by-nc-nd

Abstract

In recent years, there was an increasing popularity in applying artificial intelligence in the medical field from computer-aided diagnosis (CAD) to patient prognosis prediction. Given the fact that not all healthcare professionals have the required expertise to develop a CAD system, the aim of this study was to investigate the feasibility of using AutoML Vision, a highly automatic machine learning model, for future clinical applications by comparing AutoML Vision with some commonly used CAD algorithms in the differentiation of benign and malignant breast lesions on ultrasound. A total of 895 breast ultrasound images were obtained from the two online open-access ultrasound breast images datasets. Traditional machine learning models (comprising of seven commonly used CAD algorithms) with three content-based radiomic features (Hu Moments, Color Histogram, Haralick Texture) extracted, and a convolutional neural network (CNN) model were built using python language. AutoML Vision was trained in Google Cloud Platform. Sensitivity, specificity, F1 score and average precision (AUCPR) were used to evaluate the diagnostic performance of the models. Cochran's Q test was used to evaluate the statistical significance between all studied models and McNemar test was used as the post-hoc test to perform pairwise comparisons. The proposed AutoML model was also compared with the current related studies that involve similar medical imaging modalities in characterizing benign or malignant breast lesions. There was significant difference in the diagnostic performance among all studied traditional machine learning classifiers (P<0.05). Random Forest achieved the best performance in the differentiation of benign and malignant breast lesions (accuracy: 90%; sensitivity: 71%; specificity: 100%; F1 score: 0.83; AUCPR: 0.90) which was statistically comparable to the performance of CNN (accuracy: 91%; sensitivity: 82%; specificity: 96%; F1 score: 0.87; AUCPR: 0.88) and AutoML Vision (accuracy: 86%; sensitivity: 84%; specificity: 88%; F1 score: 0.83; AUCPR: 0.95) based on Cochran's Q test (P>0.05). In this study, the performance of AutoML Vision was not significantly different from that of Random Forest (the best classifier among traditional machine learning models) and CNN. AutoML Vision showed relatively high accuracy and comparable to current commonly used classifiers which may prompt for future application in clinical practice.

Full Text