Abstract
The great success of Convolutional Neural Networks has driven their widespread use in Industry 5.0, especially in safety-critical applications where model robustness is essential. While adversarial training is commonly used to make models more robust, it often changes how the model fuses features, leading to complex trade-offs and making its behavior harder to understand. This lack of transparency contradicts the demand for interpretable and trustworthy artificial intelligence systems in modern industry, where applications like autonomous driving and intelligent medical systems require model interpretability to assign post-accident responsibility and support legal and liability decisions. Our work introduces an innovative approach to enhance the interpretability of adversarially trained Convolutional Neural Networks, providing detailed and quantifiable insights through a six-dimensional semantic framework. This framework, which examines objects, parts, scenes, materials, textures, and colors, offers fine-grained explanations that closely mirror human cognitive processes. We conduct extensive experiments across multiple classical models, and the results reveal that adversarial training induces structure-dependent changes in feature preferences, offering insights into the trade-off between standard and robust accuracy in adversarially trained Convolutional Neural Networks. These findings provide rational support for the use of artificial intelligence in real-world applications and promote the development and implementation of explainable artificial intelligence in modern industries.
Published Version
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have