Triton Inference Server
Dec 05, 2024
Spotlight: Perplexity AI Serves 400 Million Search Queries a Month Using NVIDIA Inference Stack
The demand for AI-enabled services continues to grow rapidly, placing increasing pressure on IT and infrastructure teams. These teams are tasked with...
7 MIN READ
Oct 29, 2024
AI-Powered Devices Track Howls to Save Wolves
A new cell-phone-sized device—which can be deployed in vast, remote areas—is using AI to identify and geolocate wildlife to help conservationists track...
5 MIN READ
Oct 28, 2024
Supercharging Fraud Detection in Financial Services with Graph Neural Networks
Fraud in financial services is a massive problem. According to NASDAQ, in 2023, banks faced $442 billion in projected losses from payments, checks, and credit...
9 MIN READ
Oct 22, 2024
Scaling LLMs with NVIDIA Triton and NVIDIA TensorRT-LLM Using Kubernetes
Large language models (LLMs) have been widely used for chatbots, content generation, summarization, classification, translation, and more. State-of-the-art LLMs...
16 MIN READ
Oct 01, 2024
Evolving AI-Powered Game Development with Retrieval-Augmented Generation
Game development is a complex and resource-intensive process, particularly when using advanced tools like Unreal Engine. Developers find themselves navigating...
6 MIN READ
Sep 18, 2024
Event: Developer Day for Financial Services
Join this virtual developer day to learn how AI and Machine Learning can revolutionize fraud detection and financial crime prevention.
1 MIN READ
Aug 28, 2024
NVIDIA Triton Inference Server Achieves Outstanding Performance in MLPerf Inference 4.1 Benchmarks
Six years ago, we embarked on a journey to develop an AI inference serving solution specifically designed for high-throughput and time-sensitive production use...
8 MIN READ
Aug 28, 2024
NVIDIA Blackwell Platform Sets New LLM Inference Records in MLPerf Inference v4.1
Large language model (LLM) inference is a full-stack challenge. Powerful GPUs, high-bandwidth GPU-to-GPU interconnects, efficient acceleration libraries, and a...
13 MIN READ
Aug 21, 2024
Practical Strategies for Optimizing LLM Inference Sizing and Performance
As the use of large language models (LLMs) grows across many applications, such as chatbots and content creation, it's important to understand the process of...
2 MIN READ
Aug 06, 2024
Accelerating Hebrew LLM Performance with NVIDIA TensorRT-LLM
Developing a high-performing Hebrew large language model (LLM) presents distinct challenges stemming from the rich and complex nature of the Hebrew language...
8 MIN READ
Aug 01, 2024
Measuring Generative AI Model Performance Using NVIDIA GenAI-Perf and an OpenAI-Compatible API
NVIDIA offers tools like Perf Analyzer and Model Analyzer to assist machine learning engineers with measuring and balancing the trade-off between latency and...
6 MIN READ
Jul 16, 2024
New Workshops: Customize LLMs, Build and Deploy Large Neural Networks
Register now for an instructor-led public workshop in July, August or September. Space is limited.
1 MIN READ
Jul 08, 2024
Deploy Multilingual LLMs with NVIDIA NIM
Multilingual large language models (LLMs) are increasingly important for enterprises operating in today's globalized business landscape. As businesses expand...
9 MIN READ
Jul 02, 2024
Achieving High Mixtral 8x7B Performance with NVIDIA H100 Tensor Core GPUs and NVIDIA TensorRT-LLM
As large language models (LLMs) continue to grow in size and complexity, the performance requirements for serving them quickly and cost-effectively continue to...
9 MIN READ
Jun 14, 2024
Level Up Your Skills with Five New NVIDIA Technical Courses
With AI introducing an unprecedented pace of technological innovation, staying ahead means keeping your skills up to date. The NVIDIA Developer Program gives...
4 MIN READ
May 17, 2024
Enhancing the Apparel Shopping Experience with AI, Emoji-Aware OCR, and Snapchat's Screenshop
Ever spotted someone in a photo wearing a cool shirt or some unique apparel and wondered where they got it? How much did it cost? Maybe you've even thought...
8 MIN READ