Abstract

As the demand for real-time data processing increases, a high-speed processing platform for large-scale stream data becomes necessary. For fast processing large-scale stream data, it is essential to use multiple distributed nodes. So far, there have been few studies on real-time massive image processing through efficient management and allocation of heterogeneous resources for various user-specified nodes on distributed environments. In this paper, we shall present a new platform called RIDE (Real-time massive Image processing platform on Distributed Environment) which efficiently allocates resources and executes load balancing according to the amount of stream data on distributed environments. It minimizes communication overhead by using a parallel processing strategy which handles the stream data considering both coarse-grained and fine-grained parallelism simultaneously. Coarse-grained parallelism is achieved by the automatic allocation of input streams onto partitions of broker buffer each processed by its corresponding worker node, and maximized by adaptive resource management which adjusts the number of worker nodes in a group according to the frame rate in real time. Fine-grained parallelism is achieved by parallel processing of task on each worker node and maximized by allocating heterogeneous resources such as GPU and embedded machines appropriately. Moreover, it provides a scheme of application topology which has a great advantage for higher performance by configuring the worker nodes of each stage using adaptive heterogeneous resource management. Finally, it supports dynamic fault tolerance for real-time image processing through the coordination between components in our system.

Highlights

  • Today, data generated in real time, such as CCTV images, web logs, satellite images, and stock data, is increasing in volume, and there is a need to process large-scale data rapidly

  • Coarse-grained parallelism is achieved by the automatic allocation of input streams onto partitions each processed by its corresponding worker node, and maximized by adaptive resource management which adjusts the number of worker nodes in a group according to the frame rate in real time

  • Fine-grained parallelism is achieved by parallel processing of task on each worker node and maximized by allocating heterogeneous resources such as GPU and embedded machine appropriately

Read more

Summary

Introduction

Data generated in real time, such as CCTV images, web logs, satellite images, and stock data, is increasing in volume, and there is a need to process large-scale data rapidly. Distributed processing technologies such as Hadoop [1] using multiple nodes have been developed to process large-scale data. It has become popular, due to its Mapreduce model using the Hadoop Distributed File System (HDFS) [2] and automatic data management. The concept of Mapreduce is geared toward batch but not real-time

Methods
Results
Conclusion
Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call