TensorRT

Dec 04, 2023
NVIDIA TensorRT-LLM Enhancements Deliver Massive Large Language Model Speedups on NVIDIA H200
Large language models (LLMs) have seen dramatic growth over the last year, and the challenge of delivering great user experiences depends on both high-compute...
5 MIN READ

Dec 04, 2023
New NVIDIA NeMo Framework Features and NVIDIA H200 Supercharge LLM Training Performance and Versatility
The rapid growth in the size, complexity, and diversity of large language models (LLMs) continues to drive an insatiable need for AI training performance....
9 MIN READ

Nov 24, 2023
Explainer: What Is Retrieval-Augmented Generation aka RAG?
Retrieval-augmented generation (RAG) is a technique for enhancing the accuracy and reliability of generative AI models with facts fetched from external sources.
1 MIN READ

Nov 17, 2023
Mastering LLM Techniques: Inference Optimization
Stacking transformer layers to create large models results in better accuracies, few-shot learning capabilities, and even near-human emergent abilities on a...
25 MIN READ

Nov 13, 2023
Upcoming Webinar Series: How to Get Started With AI Inference
Join us for a series of expert-led talks where we'll explore a full-stack approach to AI inference and how to optimize the AI-inferencing workflow to lower...
1 MIN READ

Nov 07, 2023
Getting Started with Large Language Models for Enterprise Solutions
Large language models (LLMs) are deep learning algorithms that are trained on Internet-scale datasets with hundreds of billions of parameters. LLMs can read,...
12 MIN READ

Oct 19, 2023
Bringing Generative AI to Life with NVIDIA Jetson
Recently, NVIDIA unveiled Jetson Generative AI Lab, which empowers developers to explore the limitless possibilities of generative AI in a real-world setting...
11 MIN READ

Oct 19, 2023
Optimizing Inference on Large Language Models with NVIDIA TensorRT-LLM, Now Publicly Available
Today, NVIDIA announces the public release of TensorRT-LLM to accelerate and optimize inference performance for the latest LLMs on NVIDIA GPUs. This open-source...
10 MIN READ

Sep 14, 2023
Software-Defined Broadcast with NVIDIA Holoscan for Media
The broadcast industry is undergoing a transformation in how content is created, managed, distributed, and consumed. This transformation includes a shift from...
5 MIN READ

Sep 09, 2023
NVIDIA TensorRT-LLM Supercharges Large Language Model Inference on NVIDIA H100 GPUs
Large language models (LLMs) offer incredible new capabilities, expanding the frontier of what is possible with AI. However, their large size and unique...
9 MIN READ

Aug 31, 2023
Deploying YOLOv5 on NVIDIA Jetson Orin with cuDLA: Quantization-Aware Training to Inference
NVIDIA Jetson Orin is the best-in-class embedded platform for AI workloads. One of the key components of the Orin platform is the second-generation Deep...
11 MIN READ

Aug 08, 2023
Develop and Deploy Scalable Generative AI Models Seamlessly with NVIDIA AI Workbench
Developing custom generative AI models and applications is a journey, not a destination. It begins with selecting a pretrained model, such as a Large Language...
11 MIN READ

Jul 07, 2023
Train Your AI Model Once and Deploy on Any Cloud with NVIDIA and Run:ai
Organizations are increasingly adopting hybrid and multi-cloud strategies to access the latest compute resources, consistently support worldwide customers, and...
7 MIN READ

Jul 06, 2023
New MLPerf Inference Network Division Showcases NVIDIA InfiniBand and GPUDirect RDMA Capabilities
In MLPerf Inference v3.0, NVIDIA made its first submissions to the newly introduced Network division, which is now part of the MLPerf Inference Datacenter...
9 MIN READ

May 16, 2023
Sparsity in INT8: Training Workflow and Best Practices for NVIDIA TensorRT Acceleration
The training stage of deep learning (DL) models consists of learning numerous dense floating-point weight matrices, which results in a massive amount of...
12 MIN READ

May 04, 2023
Increasing Throughput and Reducing Costs for AI-Based Computer Vision with CV-CUDA
Real-time cloud-scale applications that involve AI-based computer vision are growing rapidly. The use cases include image understanding, content creation,...
11 MIN READ