Many real-world applications require multi-way feature selection rather than single-way feature selection. Multi-way feature selection is more challenging compared to single-way feature selection due to the presence of inter-correlation among the multi-way features. To address this challenge, we propose a novel non-negative matrix tri-factorization model based on co-sparsity regularization to facilitate feature co-shrinking for co-clustering. The basic idea is to learn the inter-correlation among the multi-way features while shrinking the irrelevant ones by encouraging the co-sparsity of the model parameters. The objective is to simultaneously minimize the loss function for the matrix tri-factorization, and the co-sparsity regularization imposed on the model. Furthermore, we develop an efficient and convergence-guaranteed algorithm to solve the non-smooth optimization problem, which works in an iteratively update fashion. The experimental results on various data sets demonstrate the effectiveness of the proposed approach.