In this paper, we concentrate on the images of the wounds on the human skin and propose to consider each image as a set of smaller pieces – crops or patches containing different textures. We overview, develop and compare deep learning feature extraction methods to model image crops as 200-dimensional feature vectors using various artificial neural network architectures: convolutional autoencoders, variational convolutional autoencoders, and Siamese convolutional networks trained in the contrastive learning manner. Also, we develop a custom convolutional encoder and decoder, use them in the aforementioned architectures and compare them with the ResNet encoder and decoder alternatives. Finally, we train and evaluate k-nearest neighbors and Multi-Layer Perceptron classifiers on the features extracted with the model above options to discriminate skin, wound, and background image patches. Classification evaluation results on the features, extracted with the Siamese network, show the best test accuracy for all implementations without a significant shift between model versions (accuracy > 93%); variational autoencoders show random results for all options (accuracy around 33%), and convolutional autoencoders reached good results (accuracy > 77%) but with a noticeable difference between the custom and ResNet versions; the latter is better. Custom encoder and decoder implementations are faster and smaller than the ResNet alternatives but may be less stable on larger datasets, which still needs investigation. Possible applications of the feature vectors include an area of interest extraction during wound segmentation or classification and usage as patch embeddings while training vision transformer architectures.
Read full abstract