Triton Inference Server

May 05, 2023
Why Automatic Augmentation Matters
Deep learning models require hundreds of gigabytes of data to generalize well on unseen samples. Data augmentation helps by increasing the variability of...
13 MIN READ

May 04, 2023
Increasing Throughput and Reducing Costs for AI-Based Computer Vision with CV-CUDA
Real-time cloud-scale applications that involve AI-based computer vision are growing rapidly. The use cases include image understanding, content creation,...
11 MIN READ

Apr 25, 2023
Increasing Inference Acceleration of KoGPT with NVIDIA FasterTransformer
Transformers are one of the most influential AI model architectures today and are shaping the direction of future AI R&D. First invented as a tool for...
6 MIN READ

Mar 29, 2023
Bootstrapping Object Detection Model Training with 3D Synthetic Data
Training AI models requires mountains of data. Acquiring large sets of training data can be difficult, time-consuming, and expensive. Also, the data collected...
12 MIN READ

Mar 23, 2023
Power Your AI Inference with New NVIDIA Triton and NVIDIA TensorRT Features
NVIDIA AI inference software consists of NVIDIA Triton Inference Server, open-source inference serving software, and NVIDIA TensorRT, an SDK for...
5 MIN READ

Mar 22, 2023
SDKs Accelerating Industry 5.0, Data Pipelines, Computational Science, and More Featured at NVIDIA GTC 2023
At NVIDIA GTC 2023, NVIDIA unveiled notable updates to its suite of NVIDIA AI software for developers to accelerate computing. The updates reduce costs in...
10 MIN READ

Mar 13, 2023
Serving ML Model Pipelines on NVIDIA Triton Inference Server with Ensemble Models
In many production-level machine learning (ML) applications, inference is not limited to running a forward pass on a single ML model. Instead, a pipeline of ML...
19 MIN READ

Feb 08, 2023
Speech AI Spotlight: How Pendulum Nabs Harmful Narratives Online
Over 55% of the global population uses social media, easily sharing online content with just one click. While connecting with others and consuming entertaining...
7 MIN READ

Jan 12, 2023
Autoscaling NVIDIA Riva Deployment with Kubernetes for Speech AI in Production
Speech AI applications, from call centers to virtual assistants, rely heavily on automatic speech recognition (ASR) and text-to-speech (TTS). ASR can process...
13 MIN READ

Dec 19, 2022
Deploying Diverse AI Model Categories from Public Model Zoo Using NVIDIA Triton Inference Server
Nowadays, a huge number of implementations of state-of-the-art (SOTA) models and modeling solutions are present for different frameworks like TensorFlow, ONNX,...
12 MIN READ

Dec 08, 2022
Introducing NVIDIA Riva: A GPU-Accelerated SDK for Developing Speech AI Applications
This post was updated in March 2023. Sign up for the latest Speech AI news from NVIDIA. Speech AI is used in a variety of applications, including contact...
8 MIN READ

Nov 30, 2022
Designing an Optimal AI Inference Pipeline for Autonomous Driving
Self-driving cars must be able to detect objects quickly and accurately to ensure the safety of their drivers and other drivers on the road. Due to this need...
8 MIN READ

Nov 04, 2022
Deploying a 1.3B GPT-3 Model with NVIDIA NeMo Framework
Large language models (LLMs) are some of the most advanced deep learning algorithms that are capable of understanding written language. Many modern LLMs are...
11 MIN READ

Oct 25, 2022
Run Multiple AI Models on the Same GPU with Amazon SageMaker Multi-Model Endpoints Powered by NVIDIA Triton Inference Server
Last November, AWS integrated open-source inference serving software, NVIDIA Triton Inference Server, in Amazon SageMaker. Machine learning (ML) teams can use...
2 MIN READ

Sep 21, 2022
Solving AI Inference Challenges with NVIDIA Triton
Deploying AI models in production to meet the performance and scalability requirements of the AI-driven application while keeping the infrastructure costs low...
12 MIN READ

Sep 21, 2022
New SDKs Accelerating AI Research, Computer Vision, Data Science, and More
NVIDIA revealed major updates to its suite of AI software for developers including JAX, NVIDIA CV-CUDA, and NVIDIA RAPIDS. To learn about the latest SDK...
7 MIN READ