Computational storage: an efficient and scalable platform for big data and HPC applications

Mahdi Torabzadehkashi,Hosein Bobarshad,Ali Heydarigorji,Vladimir Alves,Nader Bagherzadeh,Siavash Rezaei

doi:10.1186/s40537-019-0265-5

Mahdi Torabzadehkashi, Hosein Bobarshad + Show 4 more

Open Access

PDF Available

https://doi.org/10.1186/s40537-019-0265-5

Copy DOI

Export

Save

Cite

Journal: Journal of Big Data	Publication Date: Nov 15, 2019
Citations: 22	License type: open-access

Affiliation: University of California, Irvine

Abstract
Highlights/Summary
Full-Text PDF
Similar Papers

Abstract

Listen

In the era of big data applications, the demand for more sophisticated data centers and high-performance data processing mechanisms is increasing drastically. Data are originally stored in storage systems. To process data, application servers need to fetch them from storage devices, which imposes the cost of moving data to the system. This cost has a direct relation with the distance of processing engines from the data. This is the key motivation for the emergence of distributed processing platforms such as Hadoop, which move process closer to data. Computational storage devices (CSDs) push the “move process to data” paradigm to its ultimate boundaries by deploying embedded processing engines inside storage devices to process data. In this paper, we introduce Catalina, an efficient and flexible computational storage platform, that provides a seamless environment to process data in-place. Catalina is the first CSD equipped with a dedicated application processor running a full-fledged operating system that provides filesystem-level data access for the applications. Thus, a vast spectrum of applications can be ported for running on Catalina CSDs. Due to these unique features, to the best of our knowledge, Catalina CSD is the only in-storage processing platform that can be seamlessly deployed in clusters to run distributed applications such as Hadoop MapReduce and HPC applications in-place without any modifications on the underlying distributed processing framework. For the proof of concept, we build a fully functional Catalina prototype and a CSD-equipped platform using 16 Catalina CSDs to run Intel HiBench Hadoop and HPC benchmarks to investigate the benefits of deploying Catalina CSDs in the distributed processing environments. The experimental results show up to 2.2× improvement in performance and 4.3× reduction in energy consumption, respectively, for running Hadoop MapReduce benchmarks. Additionally, thanks to the Neon SIMD engines, the performance and energy efficiency of DFT algorithms are improved up to 5.4× and 8.9×, respectively.

Highlights

The modern human’s life has been technologized, and nowadays, people rely on big data applications to receive services such as healthcare, entertainment, government services, and transportation in their day-to-day lives
We argue that computational storage device (CSD) can considerably improve the performance of high-performance computing (HPC) applications when they utilize application-specific integrated circuit (ASIC)-based accelerators such as Neon advanced single instruction multiple data (SIMD) engines
From the system-level point of view, the Catalina CSDs are similar to regular processing nodes, and the underlying In-storage processing (ISP) hardware and software details are invisible to other nodes in the cluster

Summary

Introduction

The modern human’s life has been technologized, and nowadays, people rely on big data applications to receive services such as healthcare, entertainment, government services, and transportation in their day-to-day lives. To process data with the aforementioned characteristics, data should frequently move between storage systems and memory units of the application servers. This high-cost data movement imposes energy consumption and degrades the performance of big data applications. To overcome this issue, data processing has moved toward a new paradigm: “move process to data” rather than moving high volumes of data. A modern SSD controller is composed of two main parts: 1—front-end (FE) processing engine providing high-speed host interface protocol such as NVMe/PCIe, and 2—back-end (BE) processing engine which deals with flash management routines These two engines talk to each other to accomplish host’s I/O commands

Methods

Results

Conclusion