Boosting HPC data analysis performance with the ParSoDA-Py library

Loris Belcastro,Paolo Trunfio,Nihad Mammadli,Rosa M. Badia,Salvatore Giampà,Domenico Talia,Jorge Ejarque,Fabrizio Marozzo

doi:10.1007/s11227-023-05883-z

Abstract

Developing and executing large-scale data analysis applications in parallel and distributed environments can be a complex and time-consuming task. Developers often find themselves diverted from their application logic to handle technical details about the underlying runtime and related issues. To simplify this process, ParSoDA, a Java library, has been proposed to facilitate the development of parallel data mining applications executed on HPC systems. It simplifies the process by providing built-in scalability mechanisms relying on the Hadoop and Spark frameworks. This paper presents ParSoDA-Py, the Python version of the ParSoDA library, which allows for further support of commonly used runtimes and libraries for big data analysis. After a complete library redesign, ParSoDA can be now easily integrated with other Python-based distributed runtimes for HPC systems, such as COMPSs and Apache Spark, and with the large ecosystem of Python-based data processing libraries. The paper discusses the adaptation process, which takes into consideration the new technical requirements, and evaluates both usability and scalability through some case study applications.

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Boosting HPC data analysis performance with the ParSoDA-Py library

Abstract

Talk to us

Similar Papers

More From: The Journal of Supercomputing

Lead the way for us

Journal: The Journal of Supercomputing	Publication Date: Feb 2, 2024
License type: CC BY 4.0

Similar Papers

EasyPAB: An Extensible IDE Framework for Parallel Applications
Yu Ce ... Wu Huabei
-
Yu Ce, et. al.Yu Ce ... Wu Huabei
22 Nov 2007
22 Nov 2007

A portable, high-level graph analytics framework targeting distributed, heterogeneous systems
...
-
, et. al. ...
13 Nov 2016
13 Nov 2016

Exploiting Spark for HPC Simulation Data
Ming Jiang ... Albert Chu
-
Ming Jiang, et. al.Ming Jiang ... Albert Chu
15 Jan 2020
15 Jan 2020

Cloud computing and big data: Technologies and applications
Mostapha Zbakh ... Mohamed Essaaidi
Concurrency and Computation: Practice and Experience | VOL. 30
Mostapha Zbakh, et. al.Mostapha Zbakh ... Mohamed Essaaidi
20 May 2018
Concurrency and Computation: Practice and Experience | VOL. 30

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Boosting HPC data analysis performance with the ParSoDA-Py library

Abstract

Talk to us

Similar Papers

More From: The Journal of Supercomputing