A high-bandwidth and low-cost data processing approach with heterogeneous storage architectures

Bing Wei,Baicheng Yan,Yao Song,Wei Wei,Zhisheng Huo,Limin Xiao

doi:10.1007/s00779-020-01383-6

Abstract

How to efficiently process big data at a low cost is a substantial challenge. Many efficient and economical data processing approaches have been proposed in the fields including business, scientific research, and public administration. Unfortunately, seismic data processing has not achieved the same level of devolvement in the field of oil exploration. While many storage architectures, such as network-attached storage (NAS) and storage area network (SAN), have been widely used to process massive amounts of seismic data, these architectures are expensive in terms of bandwidth and capacity. In this paper, we propose a high-bandwidth and low-cost approach to fill this gap. NASStore is our data store built on NAS for processing seismic data. However, it cannot provide a high bandwidth at a low cost when it comes to data-intensive computing scenarios due to the massive bandwidth requirement and the huge volume of data to be stored. Distributed file systems, such as the Hadoop Distributed File System (HDFS), offer an alternative approach to store data. It delivers high aggregate performance to user applications while running on inexpensive commodity hardware. In order to overcome the shortcomings of NASStore, we first present HDFSStore that is built on HDFS for processing seismic data. We then couple NASStore and HDFSStore to construct a new hybrid data store, called SeisStore, in which efficient parallel write, read, and update mechanisms are employed to improve the system performance. The experiment results show that SeisStore reduces the storage cost than NASStore by up to 23.20% and improves the access bandwidth than NASStore and HDFSStore by up to 478.84% and 16.99%, respectively.

Full Text