Backgraund: Digital mammography is a key tool for diagnosing breast cancer (BC), which can reduce mortality by 20-40% due to early detection of pathology. To date, there have been developed many AI services that allow you to automate the process of evaluating such studies. Aims: The purpose of this work is to compare the results of different types and versions of AI services in relation to physician assessments in the analysis of mammography studies. Materials and methods: We have compared several binary scales for assessing mammography examinations, as well as several types and versions of AI services using accuracy metrics, the Matthews coefficient, and the maximum Youden index. Results: As a result of a comparative analysis, it was shown that the number of detected pathologies and the accuracy of assessing AI services depends on the binary scale for evaluating of the digital mammography. In addition, the maximum Youden index allows us to estimate the statistically significant difference between services and scales. According to this metric, AI service 1 and its version 3 have the best performance, which is consistent with most accuracy metrics. Conclusions: These results can be used in any field of medicine during the process of choosing AI services for clinical deployment.
Read full abstract