Rolling bearings are essential parts for manufacturing machines. Vast quantities of features are often extracted from measured signals to comprehensively reflect the conditions of bearings, which may cause high dimensionality, information redundancy, and time consumption. In addition, it is extremely difficult, expensive, and time-consuming to collect samples with label information during the bearing fault diagnosis in real-world scenarios. In this study, a novel bearing defect diagnosis method for small sample size is proposed based on modified local joint sparse marginal embedding (MLJSME) and Wasserstein generative adversarial networks (WGANs). MLJSME can effectively extract intrinsic sparse discriminant features of high-dimensional dataset by preserving both global and local structures. Graph embedding and Gaussian kernel function are adopted to preserve the locality structure of dataset. The global structure and discriminate information are preserved by maximum margin criterion which can also avoid small sample-size problem. Moreover, joint sparsity is applied to preserve the sparse property and improve the robustness to noise and outliers. An abundance of artificial samples can be obtained with WGAN and a few labeled samples. Firstly, a high-dimensional feature dataset consisting of time-domain and frequency-domain features is extracted from original vibration signals, then MLJSME is utilized to extract sensitive low-dimensional features, and a small number of low-dimensional features are fed into WGAN to generate a large number of artificial samples that used to train the classifier, and the bearing fault types can be finally identified. The effectiveness and feasibility of the proposed method is validated by analyzing the different experimental cases.