Pro Tip: cuBLAS Strided Batched Matrix Multiply

Research, Algorithms & Numerical Techniques, CUDA, Education & Training, Machine Learning & Artificial Intelligence

Nadeem Mohammad, posted Feb 28 2017

There’s a new computational workhorse in town. For decades, general matrix-matrix multiply—known as GEMM in Basic Linear Algebra Subroutines (BLAS) libraries—has been a standard benchmark for computational performance. GEMM is possibly the most optimized and widely used routine in scientific computing. Expert implementations are available for every architecture and quickly achieve the peak performance of

Read more

Explicit Multi-GPU with DirectX 12 – Frame Pipelining, a New Alternative

DirectX 12, GameWorks Expert Developer, GameWorks

Juha Sjoholm, posted Feb 28 2017

This is the second part of the blog post about explicit multi-GPU programming with DirectX 12. In this part, I’ll describe frame pipelining - a new way for utilizing multiple GPUs that was not possible before DirectX 12. I’ll first explain the pipelining in general and then go through a case study.

Read more

Explicit Multi-GPU with DirectX 12 – Control, Freedom, New Possibilities

DirectX 12, GameWorks Expert Developer, GameWorks

Juha Sjoholm, posted Feb 28 2017

This blog post is about explicit multi-GPU programming that became possible with the introduction of the DirectX 12 API. In previous versions of DirectX, the driver had to manage multiple SLI GPUs. Now, DirectX 12 gives that control to the application. There are two parts in this blog post. In this first part, I’ll explain how multiple GPUs are exposed in the DirectX API, giving some pointers to the API documentation. Please look for further details in the documentation itself.

Read more

Pro Tip: cuBLAS Strided Batched Matrix Multiply

CUDA Pro Tip, CUBLAS, CUDA, CUDA 8, Deep Learning, Linear Algebra, Machine Learning, Tensors

Nadeem Mohammad, posted Feb 27 2017

There’s a new computational workhorse in town. For decades, general matrix-matrix multiply—known as GEMM in Basic Linear Algebra Subroutines (BLAS) libraries—has been a standard benchmark for computational performance. GEMM is possibly the most optimized and widely used routine in scientific computing. Expert implementations are available for every architecture and quickly achieve the peak performance of […]

Read more

Nsight Visual Studio Edition 5.3 at GDC 2017

Nsight Visual Studio Edition, GameWorks Expert Developer, GameWorks

Robert Bischof, posted Feb 27 2017

NVIDIA Nsight™ Visual Studio Edition 5.3 - being shown at GDC 2017 - will soon be available for download in the NVIDIA Registered Developer Program.

This release adds OpenVR support alongside the current Oculus SDK support for virtual reality development in Direct3D and OpenGL applications. Vive and Oculus demos will be on display at the NVIDIA booth.

We’ll also be showing off Nsight’s frame debugging and profiling on laptops using Windows 10 Hybrid mode.

Read more