Agentic AI / Generative AI

Leverage the Latest Open Models for Synthetic Data Generation with NVIDIA Nemotron-4-340B

Aug 16, 2024

By Chris Alexiuk, Shashank Verma and Vivienne Zhang

Discuss (1)

AI-Generated Summary

Dislike

The Nemotron-4-340B family of models, released by NVIDIA, includes a base model, an Instruct model, and a Reward model, all available under the NVIDIA Open Model License for personal, research, and commercial use.
The Nemotron-4-340B-Reward model is a state-of-the-art multidimensional reward model that evaluates responses based on attributes like helpfulness, correctness, and coherence, achieving a score of 92.2 on Reward Bench.
NVIDIA's synthetic data generation (SDG) pipeline uses the Nemotron-4-340B-Instruct model to generate synthetic responses and the Nemotron-4-340B-Reward model to rank and filter them, enabling the creation of high-quality training data.

AI-generated content may summarize information incompletely. Verify important information. Learn more

The Llama-3.1-Nemotron 70B-Reward model helps generate high-quality training data that aligns with human preferences for finance, retail, healthcare, scientific research, telecommunications, and sovereign AI.

This post was updated on August 16, 2024 to reflect the most recent Reward Bench results.

Since the introduction and subsequent wide adoption of large language models (LLMs), data has been the lifeblood of businesses building accurate and safe AI systems. A company’s data represents its cumulative knowledge and can be leveraged in various ways, from customization (supervised fine-tuning (SFT), Parameter-Efficient Fine-Tuning (PEFT), continued pretraining, and so on) to training brand-new, domain-specific small language models (SLMs).

Data, while being one of the most critical pieces of a modern AI pipeline, has traditionally been costly and limiting during the development of innovative LLMs and SLMs, including paying human annotators and navigating the sourcing of large volumes of domain-specific data. The current process of generating high-quality data is a difficult task.

Through a process called synthetic data generation (SDG), defined later in this post, businesses can augment existing data stores by using LLMs to create customized high-quality data in large volumes.

NVIDIA is announcing a new suite of models specifically built for SDG: the Nemotron-4-340B family of models, including a state-of-the-art base model, and an Instruct model to aid in SDG. All are released under a permissive license that enables businesses and developers to use the model outputs and build incredible models.

NVIDIA Open Model License

With the release of the Nemotron-4-340B family of models, which includes base, instruct, and reward models, NVIDIA introduces the NVIDIA Open Model License, a permissive license that allows the distribution, modification, and use of the Nemotron-4-340B models and their outputs for personal, research, and commercial use, without attribution requirements.

Nemotron-4-340B-Reward model

The Nemotron-4-340B-Reward model is a state-of-the-art multidimensional reward model. The model takes a text prompt as input and returns a list of floating point numbers that are associated with the five attributes in the HelpSteer2 dataset (Figure 1).

The model has been evaluated using Reward Bench and shown to achieve benchmark-topping performance despite containing only 10K human-annotated response pairs.

Given a prompt, a reward model provides a score for a response according to human preference. In other words, it can align with human preferences for a given prompt and can replace a large number of human annotations.

Nemotron-4-340B-Reward has led Reward Bench for two months, with a current overall score of 92.2. Notably, Nemotron-4-340B Reward has the most significant lead in Chat-Hard, beating the next-best alternative by more than 10 points. Chat-Hard is a subset of the test data that evaluates “a reward model’s abilities to understand trick questions and subtly different instruction responses.” For more information, see RewardBench: Evaluating Reward Models for Language Modeling.

HelpSteer2 dataset

With the release of Nemotron-4-340B-Reward, we also introduced HelpSteer2. This dataset is permissively licensed (CC-BY-4.0) with 10K response pairs. Each prompt in the dataset contains two responses that are human-annotated using a Likert-5 Scale (from 0–4, with higher meaning better) for the following attributes:

Helpfulness: Overall helpfulness of the response to the prompt.
Correctness: Inclusion of all pertinent facts without errors.
Coherence: Consistency and clarity of expression.
Complexity: Intellectual depth required to write a response (that is, whether the response can be written by anyone with basic language competency or whether it requires deep domain expertise).
Verbosity: Amount of detail included in the response, relative to what is asked for in the prompt.

The dataset is focused on conversational data, including multi-turn conversations in the English language.

For more information, see HelpSteer2: Open-source dataset for training top-performing reward models.

SteerLM reward model training

The Nemotron-3-340B reward model was created by aligning the base model with the HelpSteer2 dataset using NeMo Aligner, a toolkit for efficient model alignment.

The Nemotron-4-340B-Reward model was trained on the Nemotron-4-340B-Base model. It has an additional linear layer that converts the final layer representation of the end-of-response token into five scalar values that correspond to a HelpSteer attribute, referred to as SteerLM reward model training. For more information about the training process, see HelpSteer2: Open-source dataset for training top-performing reward models.

Unlike binary preference-based methods, the SteerLM reward model training process enables the model to provide more expressive feedback on which responses are considered good and why. Where binary-trained reward models might sometimes conflate a long response with a good response, SteerLM reward model training explicitly teaches the model to disambiguate verbosity as a scored attribute.

Synthetic data generation overview

SDG refers to the process of creating datasets that can be used for a variety of model customizations, from SFT, PEFT including Low-Rank Adaptation (LoRA), and model alignment (using methods like RLAIF, DPO, and so on).

Use cases for SDG are not limited to model alignment but can apply to a wide range of applications, from retrieval and evaluation dataset curation to recommender systems. For this post, we focus on model alignment as the primary use case for the Nemotron-4-340B family of models.

Alignment training is a rapidly growing subdiscipline in the generative AI domain and can be implemented in several different ways. Out of all the existing methods, we discuss a specific implementation of an SDG pipeline in this post.

Critically, robust SDG methods go beyond just generating response data. They also include verification and checks to ensure ‌that data quality remains high. LLM accuracy is often directly determined by the quality rather than quantity of the training data, making the step of quality filtering crucial in SDG recipes.

A synthetic data generation flow

Figure 3 shows two steps at a high level:

Generating synthetic responses using the Nemotron-4-340B-Instruct model.
Ranking the synthetic responses using the Nemotron-4-340B-Reward model and filtering the synthetic responses to retain only high-quality samples.

Synthetic response generation

Synthetic response data can be generated by giving Nemotron-4-340B-Instruct domain-specific input queries. This enables the model to generate responses that are aligned with the input query in a format similar to those used in the Instruction Tuning with GPT-4 paper. These responses can be generated with a zero-shot, few-shot, or chain-of-thought style prompt, depending on the desired response format. Multiple responses to each query can be generated for filtering in the next step as well, if required.

The Nemotron-4-40B-Instruct model can also be used to generate domain-specific queries initially, alleviating the need for a dataset of preestablished queries. However, this use case is not covered in the tutorial material.

Reward model verification

Due to the multi-attribute nature of Nemotron-4-340B-Reward, synthetic responses can be ranked by the most desired HelpSteer2 attributes so that only the highest-performing responses are kept. This emulates the process of human evaluation of prompt quality and adds a layer of quality monitoring in SDG pipelines.

Case study

NVIDIA researchers demonstrated the effectiveness of SDG in the HelpSteer2 paper. A total of 100K rows of conversational synthetic data (“Daring Anteater” or “DA” in Figure 4) were created through the pipeline.

Using this dataset, the NVIDIA research team aligned Llama-3-70B (base model) to match or exceed Llama-3-70B-Instruct on several standard benchmarks. This was achieved despite using only 1% of the human-annotated data with which the Llama-3-70B-Instruct model was trained.

The results showcase the effectiveness of SDG and how using tools like Nemotron-4-340B-Reward and Nemotron-4-340B-Instruct can be used to add value to your data pipelines today.

There are many SDG pipelines and this is still an active topic of research. Nemotron-4-340B-Instruct was itself trained with a variation of the SDG pipeline similar to the flow in Figure 3, with 98% of its alignment training data being synthetically generated. For more information, see the Nemotron-4-340B report.

We encourage you to evaluate and develop different pipelines and share best practices, as we continue to refine our own SDG methodologies.

Conclusion

Data serves as the backbone of LLMs. Recognizing SDG as the next frontier of improving generative AI applications for enterprises, NVIDIA offers the Nemotron-4-340B family of models and SDG pipeline to enable developers and enterprises alike to turbocharge a wide range of synthetic data use cases, with a permissive license and several the highest-quality, openly available instruct and reward models.

Instructions for how to deploy the models are available on their respective model cards, with NeMo Framework instructions available for Nemotron-4-340B-Base and Nemotron-4-340B-Instruct, and NeMo Aligner instructions available for Nemotron-4-340B-Reward.

A tutorial demonstrating the above SDG pipeline using the build.nvidia.com API is available in the /NVIDIA/NeMo-Curator GitHub repo.

In a future release, you’ll be able to use the Nemotron-4-340B NVIDIA NIM microservice for optimized inference on NVIDIA GPUs.

Try out Nemotron-4-340B-Instruct through the preview inference API.

Discuss (1)

About the Authors

About Chris Alexiuk
Chris Alexiuk is a deep learning developer advocate at NVIDIA, working on creating technical assets that help developers use the incredible suite of AI tools available at NVIDIA. Chris comes from a machine learning and data science background, and he is obsessed with everything and anything about large language models.

View all posts by Chris Alexiuk

About Shashank Verma
Shashank Verma is a product research engineering manager at NVIDIA, where he leads the development and presentation of developer-focused content and proof-of-concept applications using the latest AI frameworks and platforms. He holds a master’s in Electrical Engineering from the University of Wisconsin-Madison, specializing in computer vision, security aspects in data science, and high-performance computing. Shashank is passionate about making advanced AI accessible by translating complex concepts into practical solutions for the developer community.

View all posts by Shashank Verma

About Vivienne Zhang
Vivienne Zhang is a deep learning software product manager at NVIDIA, focused on developing large language models and helping enterprises use NeMo to deploy customized LLM applications. She holds an MS in Computer Science from Massachusetts Institute of Technology. Before MIT, she worked in environmental sciences and obtained her BA from Yale University.

View all posts by Vivienne Zhang