Using HBase to Implement Speed Layer in Time Series Data Storage Systems

Milko Marinov

doi:10.14569/ijacsa.2022.0130245

Abstract

In recent years, modern systems have become increasingly integrated, and the challenges are focused on delivering real-time analytics based on big data. Thus, using standard software tools to extract information from such datasets is not always possible. The Lambda Architecture proposed by Marz is an architectural solution that can manage the processing of large data volumes by combining real-time and data batch processing techniques. Choosing a suitable database management system for storing large volumes of time series data is not a trivial issue as various aspects such as low latency, high performance and the possibility of horizontal scalability must be taken into account. The new NoSQL approaches use for this purpose non-relational databases with significant advantages in terms of flexibility and performance in comparison with the traditional relational databases. With reference to this, the purpose of this paper is to analyse the general characteristics of time series data and the main activities performed by the Speed layer in a system based on the Lambda Architecture. Based on this, the use of a column-oriented NoSQL DBMS as a system for storing time series data is justified. The paper also addresses the challenges of using HBase as a system for storing and analysing time series data. These questions are related to the design of an appropriate database schema, the need to achieve balance between ease of access to the data and performance as well as considering the factors that affect the overload of individual nodes in the system.

Highlights

Lambda Architecture defines several layers that correspond to a set of tools and techniques for building a big data processing system, i.e., a speed layer, a serving layer, and a batch layer [9]
This suggestion is based on the general characteristics of time series data and the main activities performed by the Speed layer in a Lambda Architecture based system
The Lambda Architecture provides a consistent approach to building a big data system that can perform realtime data storage and processing in a low-latency, highthroughput, and fault-tolerant manner

Summary

INTRODUCTION

The accelerated development of technologies applied to big data has caused significant changes in the subject areas of storage, retrieval, and processing of data. An important property of data related to its processing is immutability. Lambda Architecture defines several layers that correspond to a set of tools and techniques for building a big data processing system, i.e., a speed layer, a serving layer, and a batch layer [9]. The main objective of the current research is to justify the use of a column-oriented NoSQL DBMS as a system for storing time series data. This suggestion is based on the general characteristics of time series data and the main activities performed by the Speed layer in a Lambda Architecture based system.

RELATED WORK

LAMBDA ARCHITECTURE OVERVIEW

CHARACTERISTICS OF TIME SERIES DATA

STORING AND PROCESSING TIME SERIES DATA IN HBASE

CONCLUSION