Abstract

Unsupervised matrix completion algorithms mostly model the data generation process by using linear latent variable models. Recently proposed algorithms introduce non-linearity via multi-layer perceptrons (MLP), and self-supervision by setting separate linear regression frameworks for each feature to estimate the missing values. In this article, we introduce an MLP based algorithm called feature-specific neural matrix completion (FSNMC), which combines self-supervised and non-linear methods. The model parameters are estimated by a rotational scheme which separates the parameter and missing value updates sequentially with additional heuristic steps to prevent over-fitting and speed up convergence. The proposed algorithm specifically targets small to medium sized datasets. Experimental results on real-world and synthetic datasets varying in size with a range of missing value percentages demonstrate the superior accuracy for FSNMC, especially at low sparsities when compared to popular methods in the literature. The proposed method has particular potential in estimating missing data collected via real experimentation in fundamental life sciences.

Highlights

  • Missing value presence is a common problem which degrades the quality of the dataset and disturbs the data analysis

  • Inference methods based on machine learning techniques have shown significant promise for the matrix completion task, which include wide-ranging applications from recommender systems [1] to operations research [2], from image processing [3] to product development [4] and high failure rate experiments [5]

  • All the singular values are simultaneously minimized with a convex relaxation to the rank function

Read more

Summary

Introduction

Missing value presence is a common problem which degrades the quality of the dataset and disturbs the data analysis. A common approach is to assume that the data matrix is low rank that turns the problem into rank minimization. This problem is ill posed due to the non-convexity and discontinuity of the rank function [6]. The nuclear norm is modified by pruning out the largest singular values [11], and considering the partial sum of singular values [12]. All these methods still require singular value decomposition (SVD), which is computationally expensive, especially for large datasets. Recent studies try to overcome this drawback by using orthogonal matching pursuit [13] and multiple factor norms to make the optimization smooth and convex [14]

Objectives
Methods
Conclusion
Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call