Triton Inference Server

Dec 18, 2024

A Guide to Retrieval-Augmented Generation for AEC

Large language models (LLMs) are rapidly changing the business landscape, offering new capabilities in natural language processing (NLP), content generation,...

12 MIN READ

Dec 05, 2024

Spotlight: Perplexity AI Serves 400 Million Search Queries a Month Using NVIDIA Inference Stack

The demand for AI-enabled services continues to grow rapidly, placing increasing pressure on IT and infrastructure teams. These teams are tasked with...

7 MIN READ

Close-up shot of a wolf howling. Courtesy of Pexels/patrice schoefolt.

Oct 29, 2024

AI-Powered Devices Track Howls to Save Wolves

A new cell-phone-sized device—which can be deployed in vast, remote areas—is using AI to identify and geolocate wildlife to help conservationists track...

5 MIN READ

Oct 22, 2024

Scaling LLMs with NVIDIA Triton and NVIDIA TensorRT-LLM Using Kubernetes

Large language models (LLMs) have been widely used for chatbots, content generation, summarization, classification, translation, and more. State-of-the-art...

16 MIN READ

Oct 01, 2024

Evolving AI-Powered Game Development with Retrieval-Augmented Generation

Game development is a complex and resource-intensive process, particularly when using advanced tools like Unreal Engine. Developers find themselves navigating...

6 MIN READ

Sep 18, 2024

Event: Developer Day for Financial Services

Join this virtual developer day to learn how AI and Machine Learning can revolutionize fraud detection and financial crime prevention.

1 MIN READ

Aug 28, 2024

NVIDIA Triton Inference Server Achieves Outstanding Performance in MLPerf Inference 4.1 Benchmarks

Six years ago, we embarked on a journey to develop an AI inference serving solution specifically designed for high-throughput and time-sensitive production use...

8 MIN READ

Aug 28, 2024

NVIDIA Blackwell Platform Sets New LLM Inference Records in MLPerf Inference v4.1

Large language model (LLM) inference is a full-stack challenge. Powerful GPUs, high-bandwidth GPU-to-GPU interconnects, efficient acceleration libraries, and a...

13 MIN READ

Aug 21, 2024

Practical Strategies for Optimizing LLM Inference Sizing and Performance

As the use of large language models (LLMs) grows across many applications, such as chatbots and content creation, it's important to understand the process of...

2 MIN READ

Aug 01, 2024

Measuring Generative AI Model Performance Using NVIDIA GenAI-Perf and an OpenAI-Compatible API

NVIDIA offers tools like Perf Analyzer and Model Analyzer to assist machine learning engineers with measuring and balancing the trade-off between latency and...

6 MIN READ

Jul 16, 2024

New Workshops: Customize LLMs, Build and Deploy Large Neural Networks

1 MIN READ

Jul 08, 2024

Deploy Multilingual LLMs with NVIDIA NIM

Multilingual large language models (LLMs) are increasingly important for enterprises operating in today's globalized business landscape. As businesses expand...

9 MIN READ

Jul 02, 2024

Achieving High Mixtral 8x7B Performance with NVIDIA H100 Tensor Core GPUs and NVIDIA TensorRT-LLM

As large language models (LLMs) continue to grow in size and complexity, the performance requirements for serving them quickly and cost-effectively continue to...

9 MIN READ

May 17, 2024

Enhancing the Apparel Shopping Experience with AI, Emoji-Aware OCR, and Snapchat's Screenshop

Ever spotted someone in a photo wearing a cool shirt or some unique apparel and wondered where they got it? How much did it cost? Maybe you've even thought...

8 MIN READ

Apr 28, 2024

Turbocharging Meta Llama 3 Performance with NVIDIA TensorRT-LLM and NVIDIA Triton Inference Server

We're excited to announce support for the Meta Llama 3 family of models in NVIDIA TensorRT-LLM, accelerating and optimizing your LLM inference performance. You...

9 MIN READ

Mar 18, 2024

Translate Your Enterprise Data into Actionable Insights with NVIDIA NeMo Retriever

Across every industry, and every job function, generative AI is activating the potential within organizations—turning data into knowledge and empowering...

9 MIN READ