Ink analysis played an important role in document examination, but the limited dataset made it difficult for many algorithms to distinguish inks accurately. This article aimed to evaluate the feasibility of two data augmentation (DA) methods, Gaussian noise data augmentation (GNDA) and extended multiplicative signal augmentation (EMSA), for the classification of felt-tip pen ink brands. Four brands of felt-tip pens were analyzed using FT-IR spectroscopy. Five classification models were used, convolutional neural network (CNN), K-nearest neighbor (KNN), support vector machine (SVM), random forest (RF), and partial least squares discriminant analysis (PLS-DA). The results showed that the datasets generated by GNDA and EMSA are similar to the original datasets and have some diversity. The EMSA method had optimal classification results when combined with CNN, with classification accuracy (ACC), precision (PRE), recall (REC) and F1 score reaching 99.86%, 99.87%, 99.86%, 99.86%, and 99.86%, compared with GNDA-CNN method (ACC = 80.90%, PRE = 87.34%, REC = 81.62%, F1 score = 79.23%). This study shows that when raw spectral data is small, DA methods can be combined with neural network models to identify ink brands effectively.
Read full abstract