Instance selection or outlier detection is an important task during data mining, which focuses on filtering out bad data from a given dataset. However, there is no rigid mathematical definition of what constitutes an outlier and an outlier is not a binary property. Therefore, different volumes of outliers may be detected depending on the setting of the threshold for what constitutes an outlier, e.g., the distance in distance-based outlier detection. In this study, we examine bankruptcy prediction performance achieved after removal of different outlier volumes from four widely used datasets, namely the Australian, German, Japanese, and UC Competition datasets. Specifically, a simple distance-based clustering outlier detection method is used. In addition, four popular classification techniques are compared, artificial neural networks, decision trees, logistic regression, and support vector machines. Experiments are conducted to examine (1) the prediction performance of the bankruptcy prediction models with and without instance selection, (2) the stability of bankruptcy prediction models after the removal of outliers from the testing set, and (3) the characteristics of these four different datasets. The results show that with the German dataset it is much more difficult for the prediction models to provide high rates of accuracy after outlier removal, while it is easier with the UC Competition dataset. Removing 50% of the outliers can lead to optimal performance of these four models. In addition, using the removed outliers to test the prediction accuracy of these models, we find that it is support vector machines (SVM) that provide the highest rate of prediction accuracy and perform with much more stability and good noise tolerance than the other three prediction models. Furthermore, the prediction accuracy of the SVM model followed by instance selection is similar to the one without instance selection (i.e., the SVM baseline). In other words, the difference in performance between the SVM and the SVM baseline is the least of the three models in comparison with their corresponding baselines.