While deep learning (DL) has been making enormous strides in lipreading areas, it is still massively underused in learning, understanding, and producing human language content. Current DL lipreading methods rely on single-channel processing and monolingual datasets, which have a limited ability to adapt to cross-language applications. Here, we propose a novel lipreading driven deep learning framework to create cross-language learning patterns. To evaluate the algorithm’s cross-language learning ability, we present a dataset CELR-200 for both Chinese and English in lipreading, containing 200-word classes with more than 80,000 samples. We also propose two Spatio-Temporal Reconstructed 3D convolutional kernels to reconstruct the 3D convolutional Spatio-Temporal relations. By using two STR-3D convolutional kernels, we present two new lipreading models, Serial-STRNet and Parallel-STRNet. These improvements reduce the number of 3D convolutional kernel parameters and improve performance, showing good performance in CELR-200 with 65.68% and 66.35%, respectively. They outperform other lipreading models, achieving an absolute improvement of 2.56% over the state-of-the-art model. Our results identify targets for future investigations and demonstrate that STR-3D convolutional kernels can provide critical insights into lipreading tasks.