Comparison of Unsupervised Learning Methods for Natural Image Processing

Wilfried Wöber,Harald Meimberg,Cristina Olaverri-Monreal,Peter Sykacek,Lars Mehnen,Papius Tibihika

doi:10.3897/biss.3.37886

Abstract

For computer vision based appraoches such as image classification (Krizhevsky et al. 2012), object detection (Ren et al. 2015) or pixel-wise weed classification (Milioto et al. 2017) machine learning is used for both feature extraction and processing (e.g. classification or regression). Historically, feature extraction (e.g. PCA; Ch. 12.1. in Bishop 2006) and processing were sequential and independent tasks (Wöber et al. 2013). Since the rise of convolutional neuronal networks (LeCun et al. 1989), a deep machine learning approach optimized for images, in 2012 (Krizhevsky et al. 2012), feature extraction for image analysis became an automated procedure. A convolutional neuronal net uses a deep architecture of artificial neurons (Goodfellow 2016) for both feature extraction and processing. Based on prior information such as image classes and supervised learning procedures, parameters of the neuronal nets are adjusted. This is known as the learning process. Simultaneously, geometric morphometrics (Tibihika et al. 2018, Cadrin and Friedland 1999) are used in biodiversity research for association analysis. Those approaches use deterministic two-dimensional locations on digital images (landmarks; Mitteroecker et al. 2013), where each position corresponds to biologically relevant regions of interest. Since this methodology is based on scientific results and compresses image content into deterministic landmarks, no uncertainty regarding those landmark positions is taken into account, which leads to information loss (Pearl 1988). Both, the reduction of this loss and novel knowledge detection, can be done using machine learning. Supervised learning methods (e.g., neuronal nets or support vector machines (Ch. 5 and 6. in Bishop 2006)) map data on prior information (e.g. labels). This increases the performance of classification or regression but affects the latent representation of the data itself. Unsupervised learning (e.g. latent variable models) uses assumptions concerning data structures to extract latent representations without prior information. Those representations does not have to be useful for data processing such as classification and due to that, the use of supervised and unsupervised machine learning and combinations of both, needs to be chosen carefully, according to the application and data. In this work, we discuss unsupervised learning algorithms in terms of explainability, performance and theoretical restrictions in context of known deep learning restrictions (Marcus 2018, Szegedy et al. 2014, Su et al. 2017). We analyse extracted features based on multiple image datasets and discuss shortcomings and performance for processing (e.g. reconstruction error or complexity measurement (Pincus 1997)) using the principal component analysis (Wöber et al. 2013), independent component analysis (Stone 2004), deep neuronal nets (auto encoders; Ch. 14 in Goodfellow 2016) and Gaussian process latent variable models (Titsias and Lawrence 2010, Lawrence 2005).

Full Text