A novel sparse tensor dictionary learning algorithm and a convolutional neural network (CNN) classification method based on this algorithm are proposed for hyperspectral image (HSI) in this paper. The HSI is intuitively represented as a three-dimensional (3-D) cube, and utilizing the joint spatial-spectral information can significantly improve the accuracy of HSI classification. Therefore, the proposed atom-substituted tensor dictionary learning (ASTDL) algorithm utilizes tensor techniques to extract 3-D joint spatial-spectral features from HSI cubes directly. Sparsity constraint is enforced on the coefficient tensors, which obeys the sparsity attribute of HSI. The proposed ASTDL enhanced CNN (ASTDL-CNN) classification method utilizes a two-dimensional (2-D) CNN to extract deep features from the feature tensors obtained by the ASTDL, and to perform the pixel-wise classification. The use of ASTDL, which extracts intrinsic tensor features before CNN, alleviates within-class spatial-spectral variation and reduces the requirement of CNN for the labeled data. Whereas, the CNN works as a 3-D classifier, and provides the final classification results. Besides, we perform the majority vote on the classification map obtained by CNN to refine the classification. The performance of the proposed method is evaluated on three real HSI data sets. The competitive results to the compared state-of-the-art methods demonstrate that the ASTDL-CNN can provide accurate and robust classification results.