Parallel Stochastic Gradient Descent Research Articles

Latent factor analysis (LFA) is efficient in knowledge discovery from a high-dimensional and incomplete (HDI) matrix frequently encountered in industrial big data-related applications. A stochastic gradient descent (SGD) algorithm is commonly adopted as a learning algorithm for LFA owing to its high efficiency. However, its sequential nature makes it less scalable when processing large-scale data. Although alternating SGD decouples an LFA process to achieve parallelization, its performance relies on its hyper-parameters that are highly expensive to tune. To address this issue, this paper presents three extended alternating SGD algorithms whose hyper-parameters are made adaptive through particle swarm optimization. Correspondingly, three Parallel Adaptive LFA (PAL) models are proposed and achieve highly efficient latent factor acquisition from an HDI matrix. Experiments have been conducted on four HDI matrices collected from industrial applications, and the benchmark models are LFA models based on state-of-the-art parallel SGD algorithms including the alternative SGD, Hogwild!, distributed gradient descent, and sparse matrix factorization parallelization. The results demonstrate that compared with the benchmarks, with 32 threads, the proposed PAL models achieve much speedup gain. They achieve the highest prediction accuracy for missing data on most cases. <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">Note to Practitioners</i> —HDI data are commonly encountered in many industrial big data-related applications, where rich knowledge and patterns can be extracted efficiently. An SGD based-LFA model is popular in addressing HDI data due to its efficiency. Yet when dealing with large-scale HDI data, its serial nature greatly reduces its scalability. Although alternating SGD can decouple an LFA process to implement parallelization, its performance depends on its hyper-parameter whose tuning is tedious. To address this vital issue, this study proposes three extended alternating SGD algorithms whose hyper-parameters are made via through a particle swarm optimizer. Based on them, three models are realized, which are able to efficiently obtain latent factors from HDI matrices. Compared with the existing and state-of-the-art models, they enjoy their hyper-parameter-adaptive learning process, as well as highly competitive computational efficiency and representation learning ability. Hence, they provide practitioners with more scalable solutions when addressing large HDI data from industrial applications.

Read full abstract

With the wide spread and deepening of service-oriented computing, more and more enterprises and organizations are constructing their applications by integrating third-party Web services in the cloud nowadays. Building high-quality applications has long been a critical research issue. Quality of service (QoS) prediction provides valuable information for making optimal Web service selection from a set of functionally equivalent candidate services. Commonly, collaborative filtering technique like matrix factorization (MF) is implemented for predicting unknown QoS values, and most of them are built via modeling user-service interaction based on QoS data directly or take side information such as geographical location, network autonomous region into account. Due to the overlook of the implicit but important wide-range characteristic of QoS data, existing MF methods might incur high users’ and services’ biases, and their prediction accuracy will not be good enough if we are faced with such wide-range of QoS data. In this work, we first investigate the wide-range characteristic among users and services via real-world Web service QoS dataset, and argue that such observed finding is an essential factor for accurate QoS prediction. We then propose a novel prediction model named Wide-Range Aware Matrix Factorization (WRAMF), which tackles the wide-range influence via bias information combination and an active function mapping explicitly. The proposed WRAMF model is advantageous to existing MF-based models, which optimizes the model by an adaptive learning rate strategy that guides WRAMF to approach the optimal solution accurately, and trains the model by a well-designed parallel stochastic gradient descent algorithm efficiently. Comprehensive experiments are conducted by employing real-world QoS dataset and empirical results show that our WRAMF significantly outperforms the state-of-the-art methods in terms of accuracy and efficiency.

Read full abstract

Parallel Stochastic Gradient Descent Research Articles

Related Topics

Articles published on Parallel Stochastic Gradient Descent

Hierarchical Weight Averaging for Deep Neural Networks.

Adaptively-Accelerated Parallel Stochastic Gradient Descent for High-Dimensional and Incomplete Data Representation Learning

Parallel Adaptive Stochastic Gradient Descent Algorithms for Latent Factor Analysis of High-Dimensional and Incomplete Industrial Data

Decentralized Parallel SGD Based on Weight-Balancing for Intelligent IoV

Load balanced locality-aware parallel SGD on multicore architectures for latent factor based collaborative filtering

Laplacian Matrix Sampling for Communication- Efficient Decentralized Learning

On the Convergence of Hybrid Server-Clients Collaborative Training

A Graph Neural Network Based Decentralized Learning Scheme.

STL-SGD: Speeding Up Local SGD with Stagewise Communication Period

Decentralized Parallel SGD With Privacy Preservation in Vehicular Networks

Efficient and High-quality Recommendations via Momentum-incorporated Parallel Stochastic Gradient Descent-Based Learning

Guided parallelized stochastic gradient descent for delay compensation

A(DP) 2SGD: Asynchronous Decentralized Parallel Stochastic Gradient Descent With Differential Privacy.

Asynchronous Decentralized Distributed Training of Acoustic Models

D‐(DP)2SGD: Decentralized Parallel SGD with Differential Privacy in Dynamic Networks

Decentralized Distributed Deep Learning with Low-Bandwidth Consumption for Smart Constellations

WP-SGD: Weighted parallel SGD for distributed unbalanced-workload training system

Distributed Training of Deep Neural Network Acoustic Models for Automatic Speech Recognition: A comparison of current training strategies

An accurate and efficient web service QoS prediction model with wide-range awareness

A Distributed Intrusion Detection Scheme for Cloud Computing

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

Parallel Stochastic Gradient Descent Research Articles

Related Topics

Articles published on Parallel Stochastic Gradient Descent

Hierarchical Weight Averaging for Deep Neural Networks.

Adaptively-Accelerated Parallel Stochastic Gradient Descent for High-Dimensional and Incomplete Data Representation Learning

Parallel Adaptive Stochastic Gradient Descent Algorithms for Latent Factor Analysis of High-Dimensional and Incomplete Industrial Data

Decentralized Parallel SGD Based on Weight-Balancing for Intelligent IoV

Load balanced locality-aware parallel SGD on multicore architectures for latent factor based collaborative filtering

Laplacian Matrix Sampling for Communication- Efficient Decentralized Learning

On the Convergence of Hybrid Server-Clients Collaborative Training

A Graph Neural Network Based Decentralized Learning Scheme.

STL-SGD: Speeding Up Local SGD with Stagewise Communication Period

Decentralized Parallel SGD With Privacy Preservation in Vehicular Networks

Efficient and High-quality Recommendations via Momentum-incorporated Parallel Stochastic Gradient Descent-Based Learning

Guided parallelized stochastic gradient descent for delay compensation

A(DP) 2SGD: Asynchronous Decentralized Parallel Stochastic Gradient Descent With Differential Privacy.

Asynchronous Decentralized Distributed Training of Acoustic Models

D‐(DP)2SGD: Decentralized Parallel SGD with Differential Privacy in Dynamic Networks

Decentralized Distributed Deep Learning with Low-Bandwidth Consumption for Smart Constellations

WP-SGD: Weighted parallel SGD for distributed unbalanced-workload training system

Distributed Training of Deep Neural Network Acoustic Models for Automatic Speech Recognition: A comparison of current training strategies

An accurate and efficient web service QoS prediction model with wide-range awareness

A Distributed Intrusion Detection Scheme for Cloud Computing