As the size of software projects becomes larger, software defect prediction (SDP) will play a key role in allocating testing resources reasonably, reducing testing costs, and speeding up the development process. Most SDP methods have used machine learning techniques based on common software metrics such as Halstead and McCabe’s cyclomatic. Datasets produced by these metrics usually do not follow Gaussian distribution, and also, they have overlaps in defect and non-defect classes. In addition, in many of software defect datasets, the number of defective modules (minority class) is considerably less than non-defective modules (majority class). In this situation, the performance of machine learning methods is reduced dramatically. Therefore, we first need to create a balance between minority and majority classes and then transfer the samples into a new space in which pair samples with same class (must-link set) are near to each other as close as possible and pair samples with different classes (cannot-link) stay as far as possible. To achieve the mentioned objectives, in this paper, Mahalanobis distance in two manners will be used. First, the minority class is oversampled based on the Mahalanobis distance such that generated synthetic data are more diverse from other minority data, and minority class distribution is not changed significantly. Second, a feature extraction method based on Mahalanobis distance metric learning is used which try to minimize distances of sample pairs in must-links and maximize the distance of sample pairs in cannot-links. To demonstrate the effectiveness of the proposed method, we performed some experiments on 12 publicly available datasets which are collected NASA repositories and compared its result by some powerful previous methods. The performance is evaluated in F-measure, G-Mean, and Matthews correlation coefficient. Generally, the proposed method has better performance as compared to the mentioned methods.