Abstract

In recent years, the advances in technology have produced datasets of increasing size, not only regarding the number of samples but also the number of features. Unfortunately, creating a sufficiently large amount of adequately labeled data with enough examples for each class is not easy. Labeling is a challenging, expensive, and time-consuming task. It is usually done manually, which may contribute to the insertion of noise and errors in the data. Hence, it is of great importance to put forward intelligent models that can benefit from the distinct information that both labeled and unlabeled data can provide, since, for many applications, there is a plentiful amount of unlabeled data, but insufficient labeled ones. Semi-Supervised Learning (SSL) is employed to achieve this. It is halfway between supervised and unsupervised learning. In this sense, we highlight two very influential models: Self-Organizing Maps (SOM) and Learning Vector Quantization (LVQ). SOM is a biologically inspired neural model that uses unsupervised and incremental learning to produce prototypes of the input data, whereas the LVQ can be seen as its supervised counterpart. The unsupervised characteristic of SOM makes it unfeasible to execute SSL. In that way, the current work proposes new models that incorporate standard concepts from LVQ to the SOM algorithm to build semi-supervised approaches. Such proposals can dynamically switch between the two types of learning at training time, according to the availability of labels and automatically adjust themselves to the local variance observed in each cluster. The experimental results show that the proposed models can surpass the performance of other traditional methods not only in terms of classification but also regarding clustering quality. It also enhances the range of applications of SOM and LVQ-based models by combining them with Deep Learning in a synergic way to allow dealing with complex data structures, such as images and sound. Moreover, we explore forms of learning good representations of the input data, and manners to estimate the unsupervised error when no labels are provided. Our approaches demonstrated to be good at producing a meaningful topology and clustering prototypes that appropriately represent the data.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.