Improving interpretability via regularization of neural activation sensitivity

Ofir Moshe,Gil Fidel,Ron Bitton,Asaf Shabtai

doi:10.1007/s10994-024-06549-4

Abstract

State-of-the-art deep neural networks (DNNs) are highly effective at tackling many real-world tasks. However, their widespread adoption in mission-critical contexts is limited due to two major weaknesses - their susceptibility to adversarial attacks and their opaqueness. The former raises concerns about DNNs’ security and generalization in real-world conditions, while the latter, opaqueness, directly impacts interpretability. The lack of interpretability diminishes user trust as it is challenging to have confidence in a model’s decision when its reasoning is not aligned with human perspectives. In this research, we (1) examine the effect of adversarial robustness on interpretability, and (2) present a novel approach for improving DNNs’ interpretability that is based on the regularization of neural activation sensitivity. We evaluate the interpretability of models trained using our method to that of standard models and models trained using state-of-the-art adversarial robustness techniques. Our results show that adversarially robust models are superior to standard models, and that models trained using our proposed method are even better than adversarially robust models in terms of interpretability.(Code provided in supplementary material.)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Improving interpretability via regularization of neural activation sensitivity

Abstract

Talk to us

Similar Papers

More From: Machine Learning

Lead the way for us

Journal: Machine Learning	Publication Date: Jun 19, 2024
License type: CC BY 4.0

Similar Papers

2-in-1 Accelerator: Enabling Random Precision Switch for Winning Both Adversarial Robustness and Efficiency
Yonggan Fu ... Yingyan Lin
-
Yonggan Fu, et. al.Yonggan Fu ... Yingyan Lin
17 Oct 2021
17 Oct 2021

Adversarial Robustness of Deep Convolutional Neural Network-based Image Recognition Models: A Review
...
雷达学报 | VOL. 10
, et. al. ...
28 Aug 2021
雷达学报 | VOL. 10

Efficient Error-correcting Output Codes for Adversarial Learning Robustness
Li Wan ... Margreta Kuijper
-
Li Wan, et. al.Li Wan ... Margreta Kuijper
16 May 2022
16 May 2022

XploreNAS : Explore Adversarially Robust and Hardware-efficient Neural Architectures for Non-ideal Xbars
Abhiroop Bhattacharjee ... Abhishek Moitra
ACM Transactions on Embedded Computing Systems | VOL. 22
Abhiroop Bhattacharjee, et. al.Abhiroop Bhattacharjee ... Abhishek Moitra
24 Jul 2023
ACM Transactions on Embedded Computing Systems | VOL. 22

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Improving interpretability via regularization of neural activation sensitivity

Abstract

Talk to us

Similar Papers

More From: Machine Learning