Abstract

Images are one of the most dominant data sources in information systems, and the convolution neural network (CNN) is the most popular model to process them in recent decades. Nevertheless, visual features extracted by CNN are of high dimension and redundancy, which reluctantly result in dilemmas in both semantic interpretation and downstream prediction. In this study, an integrated framework is introduced that selects critical features from a trained CNN model and visualizes them with the explainable AI method. To build this framework, an approximately linear relationship between the visual features and the dependent variable is analytically proved, in terms of considering products of activations and gradients in CNN. Four datasets with both classification and regression tasks serve as case studies to evaluate the framework. Results of all cases consistently show that CNN extracted features are linearly correlated with the independent in arbitrary layers, and the linear feature selection methods are successful in targeting the critical ones. In particular, the highest 0.61Adj.R2 is achieved in the SCUT-FBP dataset for regression with only 15 selected features, which is competent to the performance of all 2048 features. These critical features can be further mapped to raw images with Grad-CAM to illustrate their semantics, which is proved to be closer to human perception than that of others. In addition, these critical features also help improve the prediction performance of downstream tasks. Both theoretical and practical implications of the proposed framework are discussed.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call