Posts by Neal Vaidya
Generative AI / LLMs
Jun 07, 2024
Seamlessly Deploying a Swarm of LoRA Adapters with NVIDIA NIM
The latest state-of-the-art foundation large language models (LLMs) have billions of parameters and are pretrained on trillions of tokens of input text. They...
11 MIN READ
Generative AI / LLMs
Apr 28, 2024
Turbocharging Meta Llama 3 Performance with NVIDIA TensorRT-LLM and NVIDIA Triton Inference Server
We're excited to announce support for the Meta Llama 3 family of models in NVIDIA TensorRT-LLM, accelerating and optimizing your LLM inference performance. You...
9 MIN READ
Generative AI / LLMs
Mar 18, 2024
NVIDIA NIM Offers Optimized Inference Microservices for Deploying AI Models at Scale
The rise in generative AI adoption has been remarkable. Catalyzed by the launch of OpenAI’s ChatGPT in 2022, the new technology amassed over 100M users within...
6 MIN READ
Generative AI / LLMs
Nov 17, 2023
Mastering LLM Techniques: Inference Optimization
Stacking transformer layers to create large models results in better accuracies, few-shot learning capabilities, and even near-human emergent abilities on a...
25 MIN READ
Generative AI / LLMs
Nov 15, 2023
NVIDIA AI Foundation Models: Build Custom Enterprise Chatbots and Co-Pilots with Production-Ready LLMs
Large language models (LLMs) are revolutionizing data science, enabling advanced capabilities in natural language understanding, AI, and machine learning....
12 MIN READ
Generative AI / LLMs
Oct 19, 2023
Optimizing Inference on Large Language Models with NVIDIA TensorRT-LLM, Now Publicly Available
Today, NVIDIA announces the public release of TensorRT-LLM to accelerate and optimize inference performance for the latest LLMs on NVIDIA GPUs. This open-source...
10 MIN READ