Feature selection is one of the important pre-processing methods for dimensionality reduction in multi-label learning tasks, which has attracted extensive attention in recent years. Most of the existing approaches transform feature data into the label space during the feature-label mapping process by assuming linear relationship between the feature and label spaces. However, the linearity assumption does not hold in most cases, especially for high-dimensional spaces. This work proposes a novel dimension reduction model for multi-label data by using nonlinear mapping (NMFS). The model introduces a point-to-point sigmoid function to describe the intrinsic relationship from data space to label space. The proposed method improves the generalization ability of feature selection by limiting the range of data transformation to the interval [0,1] which is consistent with the predicted values of the labels. The feature weight matrix is constrained by the l2,1-norm to ensure its sparsity, which forms the basis of feature selection. The variables in the NMFS model are iteratively updated using the gradient momentum optimization strategy, and a sparse weight-coefficient matrix is obtained for multi-label feature ordering. Experimental results on 14 multi-label data sets verify the effectiveness of the proposed method, and show the proposed method is superior to the most advanced multi-label feature selection methods.