Effective tool wear monitoring (TWM) is crucial for accurately assessing the degree of tool wear, guiding tool replacement during actual cutting processes, ensuring stable machine operation, and improving workpiece processing quality. With the arrival of the era of Big data, more and more data-driven monitoring methods are used for TWM problems, but it also exposes the problems of over reliance on artificial feature extraction and selection, low robustness of the actual industrial environment and poor generalization of different machining processes. To solve these problems, this paper proposes a multi-scale one-dimensional convolution (MODC-MMFL) end-to-end TWM integrated network model based on multi-model fusion learning (MMFL) skills. Firstly, multi-scale local features of multi-sensor signals are adaptively extracted by multi-scale one-dimensional convolution (MODC) network, to realize multi-feature fusion. Then, using MMFL skills, the MMFL network is composed of deep attention temporal convolutional network (DATCN) and stacked bidirectional gate recurrent unit network (SBIGRU), parallel learning time series features related to tool wear characteristics,and use a fusion layer to fuse these learned features, in which residual channel attention mechanism (RCAM) is used to improve network performance in DATCN network. Finally, the predicted tool wear value is output by fully connected regression network (FCR). In addition, this paper uses the PHM tool wear dataset to conduct experimental study on the proposed model, first verifying the effectiveness of the proposed model. Then, ablation experiments were conducted to investigate the impact of hyper-parameters on the predictive performance of the model. The model was enhanced through hyper-parameter tuning, and a generalized enhanced model was established. The experimental results showed that the enhanced model had better predictive performance compared to ordinary models. Finally, Gaussian noise is added to the original signal of the PHM tool wear dataset to simulate the high noise signal of the actual industrial environment. The noise signal is used to carry out experimental study on the enhanced model. The experimental results show that the enhanced model still has good prediction performance in the high noise environment and has high robustness to the actual industrial environment. After the above research, this paper uses the NASA tool wear dataset to conduct experimental study on the proposed model. The experimental results show that the proposed model has good predictive performance for different machining processes, verifying the generalizability of the proposed model for different machining processes. In summary, the model proposed in this paper can accurately predict tool wear values based on processing monitoring information, and has good predictive performance, anti-interference ability, and environmental adaptability, making it very suitable for practical industrial applications.