AbstractArtificial spectra were generated to match the different acid solubility properties of the rocks. The purpose of generating artificial spectra was to increase the number of samples available for future data processing with a convolutional neural network. The samples were collected from different geological matrices during targeted rock tests to support industrial applications. The inherent characteristics of the samples are their uneven distribution in the parameter space of the features and their limited availability for data‐intensive studies. Both data set characteristics constrain the prediction performance of the machine learning methods to estimate the unknown solubility of samples in the chosen acids. If the sample multiplication techniques are performed without considering the relationship between solubility of samples and their infrared spectra, the synthetic samples adversely impact the efficacy of the prediction method. By utilizing a dimensionality reduction technique (principal component analysis) and a neural network, we established a relationship between the solubility of the samples and their infrared spectra. Infrared spectra of the samples used for learning the model could be efficiently reproduced and infrared spectra of created samples could be generated. The reliability of the applied method has been shown by the comparison of the original and artificial spectra through a mean Pearson correlation coefficient and by comparing the closest neighbors to each other. This method can be used to create new samples and their infrared spectra, where different constraints must be met and the samples must be connected to the infrared spectrum.