Maggie Zhang

Maggie Zhang is a senior solutions architect at NVIDIA, working on applications in generative AI, conversational AI, and computer vision. She received her PhD in computer science and engineering from the University of New South Wales in Australia, where she worked on GPU/CPU heterogeneous computing and compiler optimizations.
Avatar photo

Posts by Maggie Zhang

Conversational AI

Scaling LLMs with NVIDIA Triton and NVIDIA TensorRT-LLM Using Kubernetes

Large language models (LLMs) have been widely used for chatbots, content generation, summarization, classification, translation, and more. State-of-the-art LLMs... 16 MIN READ
Graphic with computer, cloud, and GPU icons
Conversational AI

Autoscaling NVIDIA Riva Deployment with Kubernetes for Speech AI in Production

Speech AI applications, from call centers to virtual assistants, rely heavily on automatic speech recognition (ASR) and text-to-speech (TTS). ASR can process... 13 MIN READ
Data Center / Cloud

Dividing NVIDIA A30 GPUs and Conquering Multiple Workloads

Multi-Instance GPU (MIG) is an important feature of NVIDIA H100, A100, and A30 Tensor Core GPUs, as it can partition a GPU into multiple instances. Each... 9 MIN READ
Data Center / Cloud

Accelerating AI Inference Workloads with NVIDIA A30 GPU

NVIDIA A30 GPU is built on the latest NVIDIA Ampere Architecture to accelerate diverse workloads like AI inference at scale, enterprise training, and HPC... 6 MIN READ
Data Center / Cloud

Deploying NVIDIA Triton at Scale with MIG and Kubernetes

NVIDIA Triton Inference Server is an open-source AI model serving software that simplifies the deployment of trained AI models at scale in production. Clients... 24 MIN READ
Data Science

Getting the Most Out of the NVIDIA A100 GPU with Multi-Instance GPU

With the third-generation Tensor Core technology, NVIDIA recently unveiled A100 Tensor Core GPU that delivers unprecedented acceleration at every scale for AI,... 18 MIN READ