Abstract
HPC or super-computing clusters are designed for executing computationally intensive operations that typically involve large scale I/O operations. This most commonly involves using a standard MPI library implemented in C/C++. The MPI-I/O performance in HPC clusters tends to vary significantly over a range of configuration parameters that are generally not taken into account by the algorithm. It is commonly left to individual practitioners to optimise I/O on a case by case basis at code level. This can often lead to a range of unforeseen outcomes. The ExSeisDat utility is built on top of the native MPI-I/O library comprising of Parallel I/O and Workflow Libraries to process seismic data encapsulated in SEG-Y file format. The SEG-Y File data structure is complex in nature, due to the alternative arrangement of trace header and trace data. Its size scales to petabytes and the chances of I/O performance degradation are further increased by ExSeisDat. This research paper presents a novel study of the changing I/O performance in terms of bandwidth, with the use of parallel plots against various MPI-I/O, Lustre (Parallel) File System and SEG-Y File parameters. Another novel aspect of this research is the predictive modelling of MPI-I/O behaviour over SEG-Y File benchmarks using Artificial Neural Networks (ANNs). The accuracy ranges from 62.5% to 96.5% over the set of trained ANN models. The computed Mean Square Error (MSE), Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) values further support the generalisation of the prediction models. This paper demonstrates that by using our ANNs prediction technique, the configurations can be tuned beforehand to avoid poor I/O performance.
Highlights
Seismic data is one of the most critical factors for geophysicists to study and understand the earth structure beneath its surface or seabed
The ExSeisDat utility is built on top of the native Message Passing Interface (MPI)-I/ O library comprising of Parallel I/O and Workflow Libraries to process seismic data encapsulated in SEG-Y file format
This research paper presents a novel study of the changing I/O performance in terms of bandwidth, with the use of parallel plots against various MPI-I/O, Lustre (Parallel) File System and SEG-Y File parameters. Another novel aspect of this research is the predictive modelling of MPI-I/O behaviour over SEG-Y File benchmarks using Artificial Neural Networks (ANNs)
Summary
Seismic data is one of the most critical factors for geophysicists to study and understand the earth structure beneath its surface or seabed. The Extreme-Scale Seismic Data (ExSeisDat) Library is developed to process the SEG-Y files efficiently by further using its Parallel-I/O Library (PIOL) and Workflow Library on the HPC clusters [4]. The parallel MPI-I/O struggles in overcoming the performance degradation of a program because as it relies on certain parameters to project the I/O bandwidth. This is the case with respect to MPI-I/O when applied to ExSeisDat data and the processing of SEG-Y files. The number of MPI processes running on compute nodes, the Parallel File System (PFS) managing multiple storage objects which is known as the Lustre File System (LFS) ( [6]) in our case, and the file properties, access patterns, etc
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.