The main differences in images of footprints are the proportion of the parts of foot and the distribution of pressure, which can be considered as fine-grained image classification. Moreover, the deviation of human body weight and muscle strength increases the difficulty of identifying the left and right feet. While using a fine-grained image classification network to solve the footprint image classification problem is certainly a feasible approach, the number of parameters in a fine-grained image classification network is generally large, and therefore we would like to build a lightweight classification network that is suitable for several small footprint datasets. In this paper, a multimodal footprint recognition algorithm based on progressive multi-granularity feature fusion is proposed. First, the shallow dense connection network is used to extract features. The feature extraction ability of the model is improved with the help of channel splicing and feature multiplexing. Second, to learn footprint images of different granularities, the progressive training strategy and puzzle scrambler are applied to the model. Finally, factorized bilinear coding can aggregate local features to obtain more discriminative global representation features. Experiments show that our network achieves comparable classification accuracy to some fine-grained image classification models (PMG, MSEC) on the complete pressure footprint dataset, but the number of parameters in our network is greatly reduced. Meanwhile, our network also achieves good classification results on several other footprint datasets, which demonstrates the effectiveness of our network. At the same time, an ablation experiment was carried out to verify the effectiveness of the progressive strategy and the factorized bilinear coding.