Sergio Perez

Sergio Perez is a solution architect at NVIDIA who specializes in the training and inference of LLMs. Sergio works alongside AI developers in public supercomputer centers and sectors such as energy, automotive, finance, telecommunications, and internet services. He has contributed to production applications of LLMs covering RAG systems, optimization of inference servers, pretraining of LLMs from scratch, custom evaluation of LLMs, or quantization using FP8 formats. Sergio holds a Ph.D. in computational fluid dynamics from Imperial College London.
Avatar photo

Posts by Sergio Perez

Decorative image.
Generative AI

Benchmarking LLM Inference Costs for Smarter Scaling and Deployment

This is the third post in the large language model latency-throughput benchmarking series, which aims to instruct developers on how to determine the cost of LLM... 10 MIN READ
A decorative image.
Generative AI

Floating-Point 8: An Introduction to Efficient, Lower-Precision AI Training

With the growth of large language models (LLMs), deep learning is advancing both model architecture design and computational efficiency. Mixed precision... 11 MIN READ
Stack diagram for LLM Megatron Core.
Data Center / Cloud

Continued Pretraining of State-of-the-Art LLMs for Sovereign AI and Regulated Industries with Domyn and NVIDIA DGX Cloud

In recent years, large language models (LLMs) have achieved extraordinary progress in areas such as reasoning, code generation, machine translation, and... 17 MIN READ