Decomposition Methods for Machine Learning with Small, Incomplete or Noisy Datasets

Cesar Federico Caiafa,Jordi Solé-Casals,Toshihisa Tanaka,Sun Zhe,Pere Marti-Puig

doi:10.3390/app10238481

Cesar Federico Caiafa, Jordi Solé-Casals + Show 3 more

Open Access

https://doi.org/10.3390/app10238481

Copy DOI

Abstract

In many machine learning applications, measurements are sometimes incomplete or noisy resulting in missing features. In other cases, and for different reasons, the datasets are originally small, and therefore, more data samples are required to derive useful supervised or unsupervised classification methods. Correct handling of incomplete, noisy or small datasets in machine learning is a fundamental and classic challenge. In this article, we provide a unified review of recently proposed methods based on signal decomposition for missing features imputation (data completion), classification of noisy samples and artificial generation of new data samples (data augmentation). We illustrate the application of these signal decomposition methods in diverse selected practical machine learning examples including: brain computer interface, epileptic intracranial electroencephalogram signals classification, face recognition/verification and water networks data analysis. We show that a signal decomposition approach can provide valuable tools to improve machine learning performance with low quality datasets.

Highlights

Machine learning (ML) has been developing without a break since its beginning in the middle of the 20th century with the introduction of the first computers
In a Brain Computer Interface (BCI) application, a user generates specific brain activity patterns that can be decoded by a machine learning algorithm
One popular paradigm in BCI is Motor Imagery (MI), which states that the brain activity generated by a subject imagining movement of one of their limbs for a few seconds activate areas in the motor cortex, which are similar to those that are activated with real movements, this particular neural activity can be detected by a machine learning algorithm

Summary

Introduction

Machine learning (ML) has been developing without a break since its beginning in the middle of the 20th century with the introduction of the first computers. With recent accelerated improvements in computing power and the availability of massive datasets, ML methods based on deep neural networks, usually referred as deep learning [1], gave rise to an Artificial Intelligence (AI) revolution. ML algorithms are based on the mathematical modelling of variables and their interaction mechanisms that can explain the observations (dataset). Complex datasets, such as natural images, speech or brain signals, usually requires sophisticated ML models to capture the probability distribution of data. Most sophisticated ML models are built upon feed-forward deep neural networks having from dozens to hundreds layers leading to a very large set of model parameters.

Objectives

Methods

Results

Conclusion