Recently, gait attracts attention as a practical biometric for devices that naturally possess walking pattern sensing. In the present study, we explored the feasibility of using a multimodal smart insole for identity recognition. We used sensor insoles designed and implemented by us to collect kinetic and kinematic data from 59 participants that walked outdoors. Then, we evaluated the performance of four neural network architectures, which are a baseline convolutional neural network (CNN), a CNN with a multi-stage feature extractor, a CNN with an extreme learning machine classifier using sensor-level fusion and CNN with extreme learning machine classifier using feature-level fusion. The networks were trained with segmented insole data using 0%, 50%, and 70% segmentation overlap, respectively. For 70% segmentation overlap and both-side data, we obtained mean accuracies of 72.8% ±0.038, 80.9% ±0.036, 80.1% ±0.021 and 93.3% ±0.009, for the four networks, respectively. The results suggest that multimodal sensor-enabled footwear could serve biometric purposes in the next generation of body sensor networks.