Volta

Jun 05, 2023

CUDA 12.1 Supports Large Kernel Parameters

CUDA kernel function parameters are passed to the device through constant memory and have been limited to 4,096 bytes. CUDA 12.1 increases this parameter limit...

5 MIN READ

Jul 02, 2019

Case Study: ResNet50 with DALI

Let’s imagine a situation. You buy a brand-new, cutting-edge, Volta-powered DGX-2 server. You’ve done your math right, expecting a 2x performance increase in...

11 MIN READ

Apr 16, 2019

Machine Learning Acceleration in Vulkan with Cooperative Matrices

Machine learning harnesses computing power to solve a variety of ‘hard’ problems that seemed impossible to program using traditional languages and...

8 MIN READ

Apr 02, 2019

Tensor Core Programming Using CUDA Fortran

The CUDA Fortran compiler from PGI now supports programming Tensor Cores with NVIDIA’s Volta V100 and Turing GPUs. This enables scientific programmers using...

12 MIN READ

Mar 13, 2019

Speeding Up Semantic Segmentation Using MATLAB Container from NVIDIA NGC

Gone are the days of using a single GPU to train a deep learning model. With computationally intensive algorithms such as semantic segmentation, a single GPU...

8 MIN READ

Jan 30, 2019

Video Series: Mixed-Precision Training Techniques Using Tensor Cores for Deep Learning

Neural networks with thousands of layers and millions of neurons demand high performance and faster training times. The complexity and size of neural networks...

5 MIN READ

Jan 23, 2019

Using Tensor Cores for Mixed-Precision Scientific Computing

Double-precision floating point (FP64) has been the de facto standard for doing scientific simulation for several decades. Most numerical methods used in...

9 MIN READ

Nov 07, 2018

CUDA on Turing Opens New GPU Compute Possibilities

The Turing architecture introduces so many cool new features that it’s easy to miss the quiet revolution in GPU programming that it also represents: all of the...

9 MIN READ

Aug 21, 2018

NVSwitch Accelerates NVIDIA DGX-2

NVIDIA CEO Jensen Huang described the NVIDIA® DGX-2™ server as "the world's largest GPU" at its launch during GPU Technology Conference earlier this...

8 MIN READ

Jun 19, 2018

Introducing Apex: PyTorch Extension with Tools to Realize the Power of Tensor Cores

Today at the Computer Vision and Pattern Recognition Conference in Salt Lake City, Utah, NVIDIA is kicking off the conference by demonstrating an early release...

2 MIN READ

Jun 08, 2018

Summit GPU Supercomputer Enables Smarter Science

Today the world of open science received its greatest asset in the form of the Summit supercomputer at Oak Ridge National Laboratory (ORNL). This represents an...

11 MIN READ

May 31, 2018

A Trio of New Nsight Tools That Empower Developers to Fully Optimize their CPU and GPU Performance

Three big NVIDIA Nsight releases on the same day! NSight Systems is a brand new optimization tool; Nsight Visual Studio Edition 5.6 extends support to Volta...

6 MIN READ

Apr 21, 2018

OpenSeq2Seq: New Toolkit for Distributed and Mixed-Precision Training of Sequence-to-Sequence Models

Researchers at NVIDIA open-sourced v0.2 of OpenSeq2Seq – a new toolkit built on top of TensorFlow for training sequence-to-sequence models.

2 MIN READ

Jan 03, 2018

Nsight Visual Studio Edition 5.5 Introduces Graphics Pixel History, Next-Gen CUDA GPU+CPU debugging, Next-Gen CUDA Profiling, and now supports Volta GPUs, Win10 RS3, and CUDA 9.1

NVIDIA Nsight Visual Studio Edition 5.5 is now available for download in the NVIDIA Registered Developer Program. This release extends support to the latest…

2 MIN READ

Dec 04, 2017

TensorRT 3: Faster TensorFlow Inference and Volta Support

NVIDIA TensorRT™ is a high-performance deep learning inference optimizer and runtime that delivers low latency, high-throughput inference for deep...

19 MIN READ

Nov 19, 2017

Maximizing Unified Memory Performance in CUDA

Many of today's applications process large volumes of data. While GPU architectures have very fast HBM or GDDR memory, they have limited capacity. Making the...

18 MIN READ