Accurate prediction of moisture content is significantly crucial for ensuring process stability and product quality in the cylinder drying process. However, the drying process exhibits complex spatio-temporal characteristics and strong interference, which make accurate prediction challenging for the deep learning approach. To address this issue, this article proposes a new spatio-temporal attention-based bidirectional long-short temporal memory network (STA-BiLSTM) model for accurate moisture content prediction. First, Maximum Relevance Minimum Redundancy (mRMR) is adopted to identify optimal features highly related to moisture content. Secondly, bidirectional long-short temporal memory (Bi-LSTM) network is utilized to extract temporal dependencies from the sequential data. Subsequently, spatio-temporal attention mechanisms are designed to adaptively focus on the most relevant features and timesteps, enhancing the model’s generalization ability. Finally, due to the harsh industrial environment, eXtreme Gradient Boosting (XGBoost) is adapted to improve generalizability and robustness. Extensive experiments on a real industrial dataset of the drying process demonstrate that the proposed STA-BiLSTM approach significantly outperforms alternative approaches for predicting moisture content, validating its effectiveness and superiority.