Performance Optimization

Oct 02, 2023
Accelerated Vector Search: Approximating with RAPIDS RAFT IVF-Flat
Performing an exhaustive exact k-nearest neighbor (kNN) search, also known as brute-force search, is expensive, and it doesn’t scale particularly well to...
15 MIN READ

Sep 11, 2023
Accelerating Vector Search: Fine-Tuning GPU Index Algorithms
In this post, we dive deeper into each of the GPU-accelerated indexes mentioned in part 1 and give a brief explanation of how the algorithms work, along with a...
12 MIN READ

Sep 11, 2023
Accelerating Vector Search: Using GPU-Powered Indexes with RAPIDS RAFT
In the AI landscape of 2023, vector search is one of the hottest topics due to its applications in large language models (LLM) and generative AI. Semantic...
11 MIN READ

Sep 06, 2023
GPUs for ETL? Optimizing ETL Architecture for Apache Spark SQL Operations
Extract-transform-load (ETL) operations with GPUs using the NVIDIA RAPIDS Accelerator for Apache Spark running on large-scale data can produce both cost savings...
8 MIN READ

Jul 17, 2023
GPUs for ETL? Run Faster, Less Costly Workloads with NVIDIA RAPIDS Accelerator for Apache Spark and Databricks
We were stuck. Really stuck. With a hard delivery deadline looming, our team needed to figure out how to process a complex extract-transform-load (ETL) job on...
7 MIN READ

Jul 11, 2023
Accelerated Data Analytics: Machine Learning with GPU-Accelerated Pandas and Scikit-learn
If you are looking to take your machine learning (ML) projects to new levels of speed and scalability, GPU-accelerated data analytics can help you deliver...
14 MIN READ

Jul 10, 2023
In-Game GPU Profiling for DirectX 12 Using SetBackgroundProcessingMode
If you are a DirectX 12 (DX12) game developer, you may have noticed that GPU times displayed in real time in your game HUD may change over time for a given...
4 MIN READ

Jun 28, 2023
Improving GPU Performance by Reducing Instruction Cache Misses
GPUs are specially designed to crunch through massive amounts of data at high speed. They have a large amount of compute resources, called streaming...
11 MIN READ

Jun 07, 2023
Predicting Credit Defaults Using Time-Series Models with Recursive Neural Networks and XGBoost
Today’s machine learning (ML) solutions are complex and rarely use just a single model. Training models effectively requires large, diverse datasets that may...
12 MIN READ

Jun 05, 2023
CUDA 12.1 Supports Large Kernel Parameters
CUDA kernel function parameters are passed to the device through constant memory and have been limited to 4,096 bytes. CUDA 12.1 increases this parameter limit...
5 MIN READ

May 09, 2023
NVIDIA On-Demand: RAPIDS Sessions from GTC 2023
Get the latest best practices about how to accelerate your data science projects with RAPIDS.
1 MIN READ

May 05, 2023
NVIDIA On-Demand: Top Data Science Sessions from GTC 2023
Learn from experts about how to optimize a data pipeline or use machine learning for anomaly detection with these 15 educational sessions.
1 MIN READ

Apr 27, 2023
End-to-End AI for NVIDIA-Based PCs: Optimizing AI by Transitioning from FP32 to FP16
This post is part of a series about optimizing end-to-end AI. The performance of AI models is heavily influenced by the precision of the computational resources...
4 MIN READ

Apr 25, 2023
End-to-End AI for NVIDIA-Based PCs: ONNX and DirectML
This post is part of a series about optimizing end-to-end AI. While NVIDIA hardware can process the individual operations that constitute a neural network...
14 MIN READ

Mar 22, 2023
Reusable Computational Patterns for Machine Learning and Information Retrieval with RAPIDS RAFT
RAPIDS is a suite of accelerated libraries for data science and machine learning on GPUs: cuDF for pandas-like data structures cuGraph for graph data cuML for...
11 MIN READ

Mar 15, 2023
End-to-End AI for NVIDIA-Based PCs: NVIDIA TensorRT Deployment
This post is the fifth in a series about optimizing end-to-end AI. NVIDIA TensorRT is a solution for speed-of-light inference deployment on NVIDIA hardware....
10 MIN READ