Tool wear monitoring plays a key role in the machining industry to increase productivity and reduce downtime. The Data-driven monitoring techniques have been successfully applied in the tool wear prediction in recent years. However, the terrible environment and the varying machining parameters make the data quality and the distribution complex, which limits the performance of data-driven prediction models. How to build a model for different data distribution under different working conditions adaptively is an important research topic. To solve this problem, this paper proposed a new deep multi-task network based on sparse feature learning for tool wear prediction. By introducing the L21 norm and F-norm regularization term to the network loss function, the model can capture the tool wear characteristic sparsely from the input frequency spectrum. Experiment results on a machine tool show that the proposed method has a significant performance improvement in terms of predictive accuracy and numerical stability.