Volta
Jun 05, 2023
CUDA 12.1 Supports Large Kernel Parameters
CUDA kernel function parameters are passed to the device through constant memory and have been limited to 4,096 bytes. CUDA 12.1 increases this parameter limit...
5 MIN READ
Jul 02, 2019
Case Study: ResNet50 with DALI
Let’s imagine a situation. You buy a brand-new, cutting-edge, Volta-powered DGX-2 server. You’ve done your math right, expecting a 2x performance increase...
11 MIN READ
Apr 16, 2019
Machine Learning Acceleration in Vulkan with Cooperative Matrices
Machine learning harnesses computing power to solve a variety of ‘hard’ problems that seemed impossible to program using traditional languages and...
8 MIN READ
Apr 02, 2019
Tensor Core Programming Using CUDA Fortran
The CUDA Fortran compiler from PGI now supports programming Tensor Cores with NVIDIA’s Volta V100 and Turing GPUs. This enables scientific programmers using...
12 MIN READ
Mar 13, 2019
Speeding Up Semantic Segmentation Using MATLAB Container from NVIDIA NGC
Gone are the days of using a single GPU to train a deep learning model. With computationally intensive algorithms such as semantic segmentation, a single GPU...
8 MIN READ
Jan 30, 2019
Video Series: Mixed-Precision Training Techniques Using Tensor Cores for Deep Learning
Neural networks with thousands of layers and millions of neurons demand high performance and faster training times. The complexity and size of neural networks...
5 MIN READ
Jan 23, 2019
Using Tensor Cores for Mixed-Precision Scientific Computing
Double-precision floating point (FP64) has been the de facto standard for doing scientific simulation for several decades. Most numerical methods used in...
9 MIN READ
Nov 07, 2018
CUDA on Turing Opens New GPU Compute Possibilities
The Turing architecture introduces so many cool new features that it’s easy to miss the quiet revolution in GPU programming that it also represents: all of...
9 MIN READ
Aug 21, 2018
NVSwitch Accelerates NVIDIA DGX-2
NVIDIA CEO Jensen Huang described the NVIDIA® DGX-2™ server as "the world's largest GPU" at its launch during GPU Technology Conference earlier this...
8 MIN READ
Jun 19, 2018
Introducing Apex: PyTorch Extension with Tools to Realize the Power of Tensor Cores
Today at the Computer Vision and Pattern Recognition Conference in Salt Lake City, Utah, NVIDIA is kicking off the conference by demonstrating an early release...
2 MIN READ
Jun 08, 2018
Summit GPU Supercomputer Enables Smarter Science
Today the world of open science received its greatest asset in the form of the Summit supercomputer at Oak Ridge National Laboratory (ORNL). This represents an...
11 MIN READ
May 31, 2018
A Trio of New Nsight Tools That Empower Developers to Fully Optimize their CPU and GPU Performance
Three big NVIDIA Nsight releases on the same day! NSight Systems is a brand new optimization tool; Nsight Visual Studio Edition 5.6 extends support to Volta...
6 MIN READ
Apr 21, 2018
OpenSeq2Seq: New Toolkit for Distributed and Mixed-Precision Training of Sequence-to-Sequence Models
Researchers at NVIDIA open-sourced v0.2 of OpenSeq2Seq – a new toolkit built on top of TensorFlow for training sequence-to-sequence models.
2 MIN READ
Jan 03, 2018
Nsight Visual Studio Edition 5.5 Introduces Graphics Pixel History, Next-Gen CUDA GPU+CPU debugging, Next-Gen CUDA Profiling, and now supports Volta GPUs, Win10 RS3, and CUDA 9.1
NVIDIA Nsight Visual Studio Edition 5.5 is now available for download in the NVIDIA Registered Developer Program. This release extends support to the latest…
2 MIN READ
Dec 04, 2017
TensorRT 3: Faster TensorFlow Inference and Volta Support
NVIDIA TensorRT™ is a high-performance deep learning inference optimizer and runtime that delivers low latency, high-throughput inference for deep...
19 MIN READ
Nov 19, 2017
Maximizing Unified Memory Performance in CUDA
Many of today's applications process large volumes of data. While GPU architectures have very fast HBM or GDDR memory, they have limited capacity. Making the...
18 MIN READ