After clicking “Watch Now” you will be prompted to login or join.
Click “Watch Now” to login or join the NVIDIA Developer Program.
The Rocky Road to Tasking: Task Queues Reloaded
Laura Morgenstern, Jülich Supercomputing Centre | Ivo Kabadshow, Jülich Supercomputing Centre
We'll show you how to parallelize your irregular algorithm on GPUs with tasking, starting with an overview of our CUDA C++ tasking framework for fine-grained task parallelism. After touching on persistent threads, synchronization mechanisms, and load balancing, we'll present diverse optimization strategies. First, we'll describe the implementation of task queues based on static memory allocation. Second, we'll show how to implement work sharing on a GPU through hierarchical task queues. Third, we'll present a thread coordination scheme to reduce contention on the task queues, thus keeping all threads busy. We'll analyze each optimization step's performance gains for a prototypic implementation of a task-based fast multipole method for molecular dynamics.