A fully unsplit wave propagation algorithm for shallow water flows on GPUs
Donna Calhoun, Melody Shih, Scott Aiton, Xinshen Qu
SIAM Parallel Processing Conference, Society for Industrial and Applied Mathematics, Seattle, WA (USA) (Mini-symposium speaker) Feb 11 - Feb 15, 2020
The focus of this talk is on a GPU implementation of the fully multi-dimensional patch solver based on the wave propagation algorithm (R. J. LeVeque, Clawpack). Our CUDA implementation is designed for use on small, fixed size patches (32x32) used in a larger the block-based adaptive code ForestClaw (D. Calhoun and C. Burstedde). To update patches on the GPU, we batch-process O(1000) patches per kernel call. Each patch is assigned a single CUDA thread block, eliminating the need for syncing between blocks. By redesigning the WPA, we are able to completely update the solution on each patch in a single batch kernel call. To avoid branch divergence, special attention is given to the implementation of wave limiters. Resulting time using the GPU is about 5-7x speedup over a single CPU. We will demonstrate our algorithm using examples from the shallow water wave equations implemented in ForestClaw.