Abstract

As is the case in many big data mining scenarios with a large scale of samples, the heavy computation cost hinders the application of machine learning, which has to iteratively compute by passing over the whole dataset without considering the roles of different samples in training computation. However, we argue that most of the samples dominating computation resources contribute little to the gradient-based model update, particularly when the model is close to convergence. We define this observation as the Sample Contribution Pattern (SCP) in machine learning. This paper proposes two approaches to exploit SCP by detecting gradient characteristics and triggering the reuse of outdated gradients. In particular, this paper reports research results in (1) the definition and description of SCP to reveal an intrinsic gradient contribution pattern of different samples; (2) a novel SCP-based optimizing algorithm (SCPOA) that outperforms alternative tested algorithms in terms of computation overhead; (3) a variant of SCPOA that incorporates discarding-recovering mechanisms to carefully tradeoff between model accuracy and computation cost; (4) the implementation and evaluation of two algorithms based on popular distributed big data mining platforms running typical sample-sets; (5) intuitive convergence proof of the algorithms. Our experimental results illustrate that the proposed approaches can significantly reduce the computation cost with competitive accuracy.

Highlights

  • Big data mining is widely used in many application fields where the velocity of exploiting information and knowledge from large scale datasets is critical [1], [2]

  • In the context of gradient descent (GD), this paper proposes a new gradientbased method that can skip the gradient computation of a certain subset of samples by reusing their outdated ones, which justifies the term Sample Contribution Pattern-based Optimization Algorithm (SCPOA)

  • EXPERIMENT RESULTS AND ANALYSIS The experimental environment is a cluster system consisting of 10 nodes by default, each of which is equipped with an Intel Xeon 4-core CPU and 2.4 GHz basic frequency

Read more

Summary

Introduction

Big data mining is widely used in many application fields where the velocity of exploiting information and knowledge from large scale datasets is critical [1], [2]. These studies aim to train models and optimize model parameters using gradient-based iterative algorithms; computationefficient algorithms have been developed to solve the problem as: min L(θ) with L(θ ) := m(θ ) (1). There are various optimization methods for training tasks, such as gradient-based [8] and heuristic optimization algorithms The latter has been used in many areas, gradient-based optimization algorithms with stronger theoretical guarantees still play an important role in large-scale machine learning [7]. Gradient information is helpful even in some heuristic optimization methods, such as Shark Smell Optimization [10], where the gradient is used to update the shark’s ‘‘forward movement.’’

Objectives
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call