Tool Wear Prediction (TWP) is crucial to ensure machining quality. Compared to traditional prediction methods based on full-life or failure data, labelled data are limited or nonexistent when a new tool or similar tools have emerged. This makes it extremely challenging to predict wear using limited tracking data. This work aims to provide an innovative method for accurate long-term wear prediction of milling processes using limited monitoring data of individual tools. First, the Variational Modal Decomposition (VMD) and Pearson Correlation Threshold (PCT) are combined to adaptively filter the corresponding machining signals. Second, fitness analysis is performed by selecting features strongly related to tool degradation using feature monotonicity metrics, and sensitive information is fused based on Kernel Principal Components Analysis (KPCA). Third, a Stacked Bi-directional Long Short-Term Memory with attention mechanism (AT-SBiLSTM) model is constructed to ascertain the connection between sensitive feature and the future wear state of the tool by multi-step advance rolling prediction. The results of the various milling experiment show that the proposed approach can provide accurate forecasts of tool wear using limited monitoring data.