After clicking “Watch Now” you will be prompted to login or join.
The Rocky Road to Tasking: Task Queues Reloaded
Laura Morgenstern, Jülich Supercomputing Centre | Ivo Kabadshow, Jülich Supercomputing Centre
GTC 2020
We'll show you how to parallelize your irregular algorithm on GPUs with tasking, starting with an overview of our CUDA C++ tasking framework for fine-grained task parallelism. After touching on persistent threads, synchronization mechanisms, and load balancing, we'll present diverse optimization strategies. First, we'll describe the implementation of task queues based on static memory allocation. Second, we'll show how to implement work sharing on a GPU through hierarchical task queues. Third, we'll present a thread coordination scheme to reduce contention on the task queues, thus keeping all threads busy. We'll analyze each optimization step's performance gains for a prototypic implementation of a task-based fast multipole method for molecular dynamics.