Abstract

e14068 Background: The interpretation of the mammography is challenging, especially in young women, who have dense breasts. Artificial intelligence (AI) promises to improve breast cancer detection; however these systems should be tested on different datasets. Our aim is to evaluate the performance of a publicly available deep convolutional neural network, developed by Wu et al. (IEEE Trans. Med. Imaging, 2019), applied to mammograms of young women. Methods: The test dataset consisted of mammograms obtained on a single occasion from 135 young women (up to 40 years old) on a Siemens mammography system. Each exam consisted of 4 full-field digital mammography images and had two labels (left malignant and right malignant). Mammograms were analyzed by a single mammography trained radiologist, using BI-RADS reporting tool. Among 270 labels, 170 were malignant and 100 were non-malignant. We used the program developed by Wu et al. that, according to the authors, presents AUC of 0.895 for the general population. As a preliminary test, we ran this program in a publicly available dataset named INbreast and obtained AUC of 0.8708, very close to the result reported by the authors. Results: We applied the program to our dataset of young women and obtained AUC of 0.876. We computed its standard error, obtaining 0.0290. At equal error rate point of the ROC curve, specificity and sensitivity are both 0.774. With this result we conclude that, at least for our dataset, cancer detection in young women is not substantially more difficult than in general population for an AI system. We fine-tuned the weights of the original network to the population of young women using transfer learning and obtained a slight improvement in AUC: 0.9018±0.0528, where the mean and the standard error were obtained using 5-fold cross validation. As the improvement was small and the standard errors are large, we would have to test on a larger test set to ensure that the observed improvement is real. Conclusions: We conclude, based on the experimental data, that there is no substantial degradation in accuracy when a mammogram screening program for general population is used for young women. We also conclude that it seems to be possible to obtain a slight improvement in accuracy by fine-tuning the network for the population of young women.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call