Large language models (LLMs) are capable of recognizing, summarizing, translating, predicting, and generating content. Yet even the most powerful LLMs face limitations when working with specialized business knowledge, niche technical domains, or the diverse linguistic and cultural contexts of global operations.
Most models labeled as multilingual, for example, are trained mainly in English, leaving gaps in accuracy, cultural nuance, and fairness. While using retrieval-augmented generation (RAG) helps to address some challenges, delivering accurate results for specific sectors and across languages requires deeper model adaptation.
This post introduces a new hands-on workshop, Adding New Knowledge to LLMs, available at NVIDIA GTC Paris. The NVIDIA Deep Learning Institute workshop equips developers with the skills to transform open source LLMs into highly capable, domain-specialized, and genuinely multilingual AI assets. Read on to learn more about the workshop and how to reserve your seat.
Why multilingual model evaluation matters
Model evaluation is critical for guiding model selection, development, and customization. It helps balance cost, latency, and quality throughout pretraining, fine-tuning, and inference. For LLMs, which rely on natural language interaction, multilingual evaluation is especially important.
For example, nearly half of Europeans use English as a second language, but millions still prefer to interact in their native tongues. Despite this, models like Llama 2 are trained on less than 5% of non-English language data (Figure 1), with many others in similar situations. In addition, without rigorous testing, labeling a model as multilingual can be misleading and result in costly deployment issues.

Challenges in multilingual model training and evaluation
Some of the challenges involved in training and evaluating multilingual AI models are explained in this section. How to address these challenges will be covered in more detail in the Adding New Knowledge to LLMs workshop at GTC Paris.
- Fragmented benchmarks: No shared, homogeneous dataset spans 24 EU languages, and additional local variants. Existing sets vary in task design and metrics, making scores hard to compare.
- Translation artifacts: Many benchmarks are machine-translated from English, introducing unnatural phrasing that skews results.
- Task imbalance: Discriminative tasks (multiple-choice, classification) dominate and generative tasks (summarization, open-ended QA) lag behind, yet the latter power most real-world use cases.
- Metric pitfalls: Surface-level metrics such as BLEU and ROUGE penalize valid word-order variations. For example, “The market is open today” versus “Today the market is open.” Aggregating heterogeneous metrics into one headline number amplifies bias.
- Comprehensive proficiency: True fluency can cover at least 10 dimensions: grammar, vocabulary, cultural competence, domain knowledge, discourse, bias, time relevance, dialectal variation, script handling, and long-form consistency. Current tests touch only a subset of these.
NVIDIA DLI workshop: Adding New Knowledge to LLMs
Adding New Knowledge to LLMs is a full-day, instructor-led workshop offered at GTC Paris. In the workshop, you’ll learn to gain the skills needed to transform open source LLMs into domain-specialized, multilingual AI assets. You’ll work through the following four key tasks to master the complete lifecycle of AI model customization:
Task 1 – Systematic evaluation and dataset creation: Discover how to build custom evaluation benchmarks using NVIDIA NeMo Evaluator to precisely identify the limitations of an LLM, both in understanding specialized domain concepts and in its performance across various languages. You’ll track engineering progress effectively and learn to define metrics that capture what truly matters for your specific use case, whether it’s domain-specific accuracy or nuanced multilingual understanding.
Task 2 – Advanced data curation: Implement state-of-the-art data cleaning and preparation pipelines with NeMo Curator. You’ll learn to assemble high-quality datasets tailored to your unique requirements, encompassing both specialized domain-specific information and diverse multilingual content. This includes strategies for sourcing and integrating niche data, as well as handling the complexities of multiple languages, scripts, and cultural contexts.
Task 3 – Targeted knowledge injection: Master a range of powerful adaptation techniques to effectively infuse your LLM with new knowledge and capabilities. You’ll explore how you can significantly enhance your model’s expertise and global reach with:
- In-context learning
- Parameter-efficient fine-tuning (PEFT)
- Continued pretraining (CPT) with domain-specific and multilingual corpora
- Supervised fine-tuning (SFT) for targeted tasks
- Direct preference optimization (DPO) based on domain expert and multilingual preferences
Task 4 – Model optimization for domain and language: Apply advanced distillation, quantization, and pruning techniques using NVIDIA NeMo Model Optimizer and NVIDIA TensorRT-LLM. The focus will be on dramatically reducing inference costs and improving operational efficiency, while critically ensuring that these optimizations preserve high performance on your specialized domain tasks and maintain robust capabilities across all targeted languages, including low-resource ones.
By completing the course, you’ll have the skills to develop, deploy, and operate AI systems that are tailored to your specific domain requirements and are also genuinely multilingual, ready to deliver more accurate, relevant, and culturally resonant experiences to a global audience.
Real-world impact advancing multilingual AI
NVIDIA collaborates with organizations worldwide to develop improved datasets and models with robust multilingual capabilities, and partners are already seeing impactful results. For instance, collaborations with groups like the Barcelona Supercomputing Centre have led to significant improvements in language-specific task accuracy.
Similarly, partnerships with initiatives such as EuroLLM have led to the development of powerful multilingual AI models like EuroLLM 9B Instruct, which supports all 24 official EU languages and excels at tasks such as question answering, summarization, and translation across diverse linguistic markets. These joint efforts are a key part of advancing multilingual AI. Join the workshop to explore the pipelines that make such advancements possible.
Join us at NVIDIA GTC Paris
The journey to mastering domain-specific and multilingual AI begins at NVIDIA GTC Paris. To get started with hands-on experience, reserve your seat to attend the workshop, Adding New Knowledge to LLMs.
Ready for more? Check out these related GTC Paris sessions:
Sovereign AI in Practice: Building, Evaluating, and Scaling Multilingual LLMs [CWEP1103]: NVIDIA experts explain how to enrich language models with new knowledge—expanding their capabilities in specialized business, engineering, or scientific domains, and adjusting adaptation to new languages, cultures, and values, even when foundational understanding is initially lacking.
Building and Customizing AI Models for European Applications: From Foundation to Fine-Tuning [GP1046]: This panel discussion explores the vision and strategic frameworks for constructing sovereign LLMs attuned to Europe’s distinct cultural, economic, and societal fabric, featuring insights from leading European model builders like BSC and EuroLLM, alongside practical applications from ThinkDeep.