With the advances of sensing technology and deep learning, deep learning based human activity recognition from sensor signal data has been actively studied. While deep neural networks can automatically extract features appropriate for the target task and focus on increasing the recognition performance, they cannot select important input sensor signals, which leads to the lack of interpretability. Since not all signals from wearable sensors are important for the target task, sensor signal importance will be insightful information for practitioners. In this article, we propose an interpretable and accurate convolutional neural network capable of select important sensor signals. This is enabled by spatially sparse convolutional filters whose sparsity is imposed by spatial group lasso. While there is a tradeoff between accuracy and interpretability in a model, experimental results on the opportunity activity recognition dataset show that the proposed model can help improve recognition performance and select important sensor signals, providing interpretability.