Evaluating the trustworthiness of deep learning-based computer-aided diagnosis (CAD) systems is challenging. There is a need to optimize trust and performance in model selection. A wide range of models based on evaluation metrics can make it challenging to choose the best one, especially for complex multi-criteria decision-making problems. In the case of COVID-19 diagnosis, using both physicians’ evaluation and AI techniques to establish trust is essential. In this study, 1551 chest X-rays were analyzed using deep transfer learning (DTL) with six models and four SVM kernels. This resulted in 24 hybrid DTL–SVM models. Seven metrics were evaluated using fuzzy multiple-criteria decision-making (MCDM), which uses a dynamic decision matrix to select the best model. This matrix incorporates fuzzy-weighted zero-inconsistency (FWZIC) for weight coefficients and Kriterijumska Optimizacija I Kompromisno Resenje (VIKOR) for benchmarking. The Grad-CAM technique compared the best model with 16 images, ensuring explainability. Top-performing models were identified, including SqueezeNet-SVM linear, VGG19-SVM linear, and VGG19-SVM. Sensitivity analysis was used to quantify the impact of changing weighted criteria values. A physician expert validated fuzzy MCDM through Grad-CAM analysis, a new aspect of this study. The framework presented in this study was benchmarked against seven other studies and achieved a perfect score in four crucial areas. Trustworthiness is essential for CAD systems, and this study effectively addresses trust and performance challenges in AI model-based CAD systems. The study systematically evaluated the requirements for trustworthiness, including accountability, fairness, robustness, accuracy, and reproducibility, and the results supported this.
Read full abstract