Technical Walkthrough 0

Boosting Application Performance with GPU Memory Prefetching

NVIDIA GPUs have enormous compute power and typically must be fed data at high speed to deploy that power. That is possible, in principle, because GPUs also... 10 MIN READ
Technical Walkthrough 0

Making Apache Spark More Concurrent

Apache Spark provides capabilities to program entire clusters with implicit data parallelism. With Spark 3.0 and the open source RAPIDS Accelerator for Spark,... 7 MIN READ
GPU Pro Tip
Technical Walkthrough 0

GPU Pro Tip: CUDA 7 Streams Simplify Concurrency

Heterogeneous computing is about efficiently using all processors in the system, including CPUs and GPUs. To do this, applications must execute functions... 8 MIN READ
Technical Walkthrough 1

How to Overlap Data Transfers in CUDA C/C++

In our last CUDA C/C++ post we discussed how to transfer data efficiently between the host and device.  In this post, we discuss how to overlap data... 12 MIN READ
Technical Walkthrough 0

How to Overlap Data Transfers in CUDA Fortran

[caption id="attachment_8972" align="alignright" width="242"] CUDA Fortran for Scientists and Engineers shows how high-performance application developers can... 12 MIN READ