In this paper, we propose a convolutional neural network (CNN) model for device-free fingerprinting indoor localization based on Wi-Fi channel state information (CSI). Besides, we develop an interpretation framework to understand the representations learned by the model. By quantifying and visualizing CNN in comparison with the fully-connected feedforward deep neural network (DNN) (or multilayer perceptron), we observe that each model can automatically identify location-specific patterns, which are however different across models and are linked to the respective performance of each model. Furthermore, we quantify how features, relevant or otherwise, as deemed by the adopted quantifying metrics (i.e., relevance scores, calculated by relevance propagation techniques), determine or affect the performance results. Interpretation of learning models for wireless applications is challenging due to the lack of human sensory intuition and reference. The results presented in this paper provide visually perceivable evidence and plausible explanations for the performance advantages of CNN in this important application.