Machine learning is a powerful data-driven technology for regression and classification tasks in numerous branches of science and technology. With probabilistic machine learning, we can approximate the probability density function of a given digital data set, unlocking the sampling of new and realistic data. We implement probabilistic machine learning as a statistical-based data augmentation tool to effectively improve the performance of a classifier on scarce data sets. Limited data are the norm in most practical situations, as is the case with core samples retrieved from unconventional oil and gas reservoirs. Obtaining and properly labeling rock sample images is a manual and time-consuming task. This leads to scarce data sets that make the lithologic classification of rock samples challenging. We illustrate the technology on images taken from well cores from the Argentinean Neuquén and Austral basins, where by introducing generated samples in the training stage of a machine-learning classifier, we achieve a 5% higher accuracy with respect to the baseline obtained using the same number of samples from the original data set. We reach these results by proposing a lightweight workflow that can be run on very limited hardware. Our workflow is an alternative to classical data augmentation strategies for dealing with incomplete information, such as geometric transformations.