High performance computing of waves, currents and contaminants in rivers and coastal areas of seas on multi-processors systems and GPUs

Maxim Sorokin,Oleksandr Pylypenko,Pavlo Kolomiets,Mark Zheleznyak,Sergii Kivva

doi:10.5194/egusphere-egu2020-11372

Abstract

&lt;p&gt;The shallow water flows in coastal areas of seas, rivers and reservoirs are simulated usually by 2-D depth averaged models. However, the needs for fine resolution of the computational grids and large scales of the modeling areas require in practical applications to use the algorithms and hardware of HPC. We present comparison of the computational efficiency of the developed parallel 2-D modeling system COASTOX on CPU based multi-processor systems and GPUs.&lt;/p&gt;&lt;p&gt;The hydrodynamic module of COASTOX is based on nonlinear shallow water equations (SWE), which describe currents and long waves, including tsunami, river flood waves and wake waves, generated by big vessels in shallow coastal areas. The special pressure term in momentum equations depending from the form of the draft of the vessel is used for wave generation by moving vessels. The currents in the marine nearshore areas generated by wind waves are described by the including into the SWE the wave-radiation stress terms. Sediment and pollutant transport are described by the 2-D advection-diffusion equations with the sink-source terms describing sedimentation-erosion and water-bottom contaminate exchange.&lt;/p&gt;&lt;p&gt;Model equations are solved by finite volume method on rectangular grids or unstructured grids with triangular cells. Solution scheme of SWE is Godunov-type, explicit, conservative, has TVD property. Second order in time and space is achieved by Runge-Kutta predictor-corrector method and using different methods for calculating fluxes at predictor and corrector steps. Transport equations schemes are simple upwind and have first order in time and space.&lt;/p&gt;&lt;p&gt;Model parallelized for computations on multi-core CPU systems based on domain decomposition approach with halo boundary structures and message-passing updating. To decompose an unstructured model grid, METIS graph partition library is used. For halo values updating the MPI technology is implemented with using of non-blocking send and receive functions.&lt;/p&gt;&lt;p&gt;For computations on GPU the model is parallelized using OpenACC directive-based programming interface. Numerical schemes of the model are implemented in the form of loops for cells, nodes, faces with independent iterations because of scheme explicitness and locality. So, OpenACC directives inserted in model code specify for compiler the loops that may be computed in parallel.&lt;/p&gt;&lt;p&gt;The efficiency of the developed parallel algorithms is demonstrated for CPU and GPU computing systems by such applications:&lt;/p&gt;&lt;ol&gt;&lt;li&gt;Simulation of river flooding of July 2008 extreme flood on Prut river (Ukraine).&lt;/li&gt; &lt;li&gt;Modeling of ship waves caused by tanker passage on the San Jacinto river near Barbours Cut Container Terminal (USA) and loads on moored container ship.&lt;/li&gt; &lt;li&gt;Simulation of the consequences of the breaks of the dikes constructed on the heavy contaminated floodplain of the Pripyat River upstream Chernobyl Nuclear Power Plant.&lt;/li&gt; &lt;/ol&gt;&lt;p&gt;For parallel performance testing we use Dell 7920 Workstation with 2 Intel Xeon Gold 6230 20 cores processors and NVIDIA Quadro RTX 5000 GPU. We obtain that multi-core computation up to 17.3 times faster than single core with parallel efficiency 43%. And for big computational grid (about or more than a million nodes) GPU faster than single core in 47.5-79.6 times and faster than workstation in 3-4.6 times.&lt;/p&gt;

Full Text