Grace Ho is a senior deep learning performance architect at NVIDIA on the inference benchmarking team focusing on LLM inference. She holds a Ph.D. and Master’s degree from Carnegie Mellon University in Computer Engineering, and a B.S. in Electrical Engineering from Stanford University.
Achieving High Mixtral 8x7B Performance with NVIDIA H100 Tensor Core GPUs and TensorRT-LLM

As large language models (LLMs) continue to grow in size and complexity, the performance requirements for serving them quickly and cost-effectively continue to... 9 MIN READ