Memory
Mar 08, 2024
Optimizing Memory and Retrieval for Graph Neural Networks with WholeGraph, Part 1
Graph neural networks (GNNs) have revolutionized machine learning for graph-structured data. Unlike traditional neural networks, GNNs are good at capturing...
9 MIN READ
Dec 18, 2023
Deploying Retrieval-Augmented Generation Applications on NVIDIA GH200 Delivers Accelerated Performance
Large language model (LLM) applications are essential in enhancing productivity across industries through natural language. However, their effectiveness is...
10 MIN READ
Aug 22, 2023
Simplifying GPU Application Development with Heterogeneous Memory Management
Heterogeneous Memory Management (HMM) is a CUDA memory management feature that extends the simplicity and productivity of the CUDA Unified Memory programming...
16 MIN READ
Jun 27, 2022
Boosting Application Performance with GPU Memory Access Tuning
NVIDIA GPUs have enormous compute power and typically must be fed data at high speed to deploy that power. That is possible, in principle, as GPUs also have...
13 MIN READ
Jul 27, 2021
Using the NVIDIA CUDA Stream-Ordered Memory Allocator, Part 2
In part 1 of this series, we introduced new API functions, cudaMallocAsync and cudaFreeAsync, that enable memory allocation and deallocation to be...
9 MIN READ
Jul 27, 2021
Using the NVIDIA CUDA Stream-Ordered Memory Allocator, Part 1
Most CUDA developers are familiar with the cudaMalloc and cudaFree API functions to allocate GPU accessible memory. However, there has long been an obstacle...
14 MIN READ
Jul 19, 2021
Reducing Acceleration Structure Memory with NVIDIA RTXMU
Acceleration structures spatially organize geometry to accelerate ray tracing traversal performance. When you create an acceleration structure, a conservative...
11 MIN READ
May 20, 2021
Tips: Acceleration Structure Compaction
In ray tracing, more geometries can reside in the GPU memory than with the rasterization approach because rays may hit the geometries out of the view frustum....
7 MIN READ
Jan 29, 2021
Managing Memory for Acceleration Structures in DirectX Raytracing
In Microsoft Direct3D, anything that uses memory is considered a resource: textures, vertex buffers, index buffers, render targets, constant buffers, structured...
6 MIN READ
Dec 18, 2020
Making Apache Spark More Concurrent
Apache Spark provides capabilities to program entire clusters with implicit data parallelism. With Spark 3.0 and the open source RAPIDS Accelerator for Spark,...
7 MIN READ
Dec 08, 2020
Fast, Flexible Allocation for NVIDIA CUDA with RAPIDS Memory Manager
When I joined the RAPIDS team in 2018, NVIDIA CUDA device memory allocation was a performance problem. RAPIDS cuDF allocates and deallocates memory at high...
24 MIN READ
Apr 15, 2020
Introducing Low-Level GPU Virtual Memory Management
There is a growing need among CUDA applications to manage memory as quickly and as efficiently as possible. Before CUDA 10.2, the number of options available to...
23 MIN READ
Aug 06, 2019
GPUDirect Storage: A Direct Path Between Storage and GPU Memory
As AI and HPC datasets continue to increase in size, the time spent loading data for a given application begins to place a strain on the total application’s...
17 MIN READ
Feb 21, 2019
Optimizing End-to-End Memory Networks Using SigOpt and GPUs
Natural language systems have become the go-between for humans and AI-assisted digital services. Digital assistants, chatbots, and automated HR systems all rely...
18 MIN READ
Nov 19, 2017
Maximizing Unified Memory Performance in CUDA
Many of today's applications process large volumes of data. While GPU architectures have very fast HBM or GDDR memory, they have limited capacity. Making the...
18 MIN READ
Mar 25, 2014
NVLink, Pascal and Stacked Memory: Feeding the Appetite for Big Data
For more recent info on NVLink, check out the post, "How NVLink Will Enable Faster, Easier Multi-GPU Computing". NVIDIA GPU accelerators have emerged in...
5 MIN READ