High-dimensional and incomplete (HDI) data are commonly encountered in various big data-related applications concerning the complex interactions among numerous nodes, such as the user-item iterations in a recommender system. A stochastic gradient descent (SGD)-based latent factor analysis (LFA) model can perform efficient representation learning to such HDI data, thereby extracting useful knowledge from them. However, a standard SGD algorithm updates a latent factor based on the current stochastic gradient only, without the considerations on the past information, making a resultant model suffer from slow convergence. To address this critical issue, this paper proposes an Adaptive Non-linear PID-incorporated SGD (ANPS) algorithm with two-fold ideas: 1) rebuilding the instant learning error when computing the stochastic gradient following the principle of a nonlinear PID controller to incorporate past update information into the learning scheme efficiently, and 2) implementing gain parameter adaptation following the principle of particle swarm optimization (PSO). Experiments on six widely-adopted HDI datasets demonstrate that compared with state-of-the-art LFA models, an ANPS-based LFA model achieves significant advantage in both efficiency and accuracy. Moreover, its flexible gain parameter adaptation mechanism greatly boosts its practicability for real issues. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Note to Practitioners</i> —In many industrial applications like recommender systems, social network systems, and cloud service systems, people usually encounter numerous nodes and their highly-incomplete relationships. An HDI matrix is commonly adopted to describe such specific relationships. One of the major challenges is to acquire useful knowledge from an HDI matrix efficiently and accurately for various data analysis tasks, e.g., accurate recommendation, community detection, and web service selection. An SGD-based LFA model has been widely adopted to tackle this issue. However, it suffers from slow convergence that leads to considerable time cost on large-scale datasets. This study proposes an ANPS algorithm following the principle of a nonlinear PID controller. With it, an ANPS-based LFA model is achieved, which possesses fast convergence rate on an industrial HDI matrix. The proposed ANPS algorithm can be leveraged for different types of various machine learning models, thereby improving their utility and scalability in practice.
Read full abstract