Generative AI / LLMs

Deploy GPU-Optimized AI Software with One Click Using Brev.dev and NVIDIA NGC Catalog

Brev.dev is making it easier to develop AI solutions by leveraging software libraries, frameworks, and Jupyter Notebooks on the NVIDIA NGC catalog.

You can use Brev.dev to easily deploy software on an NVIDIA GPU by pairing a cloud orchestration tool with a simple UI. Get an on-demand GPU reliably from any cloud, access the notebook in-browser, or SSH directly into the machine with the Brev CLI.

The NGC catalog is a hub for secure GPU-optimized software, including containers, models, and Jupyter Notebooks. The catalog simplifies the process of developing and deploying AI solutions, accelerating time to market for enterprises.

Solution highlights

This integration solves many challenges that are often involved in launching GPU instances in the cloud and getting an integrated development environment (IDE), such as setting up the underlying software stack, provisioning the right compute, and managing SSH keys. With just one click, you can now seamlessly deploy various software from the NGC catalog to your preferred compute and IDE. 

Key features of this solution include:

  • 1-click deploy: Streamline deployment of NVIDIA AI software without needing quota, expertise to provision, or setup. This very manual process often takes hours, reduced to just 2-3 minutes with 1-click deploy.  
  • Deploy anywhere: Brev’s API serves as a unified interface spanning across on-premises data centers, public cloud providers, and hybrid and private clouds. This empowers you to deploy NVIDIA software easily across any environment. Remove potential lock-in to any compute source by adding your cloud to Brev.
  • Simplified setup process: Brev’s open-source container tool, Verb, reliably installs CUDA and Python on any GPU and resolves dependency issues, saving valuable time.
  • Secure networking: With Brev’s CLI tool, you can effortlessly manage SSH keys, securely connecting to any compute source without dealing with cumbersome IPs or PEM files. 

Fine-tuning a Mistral Jupyter Notebook

This section walks through an example Jupyter Notebook focused on fine-tuning large language models (LLMs). Specifically, it explores how to use NVIDIA NeMo to train, evaluate, and test Mistral 7B for a question-answer task. 

NVIDIA NeMo is an end-to-end platform for developing custom generative AI, anywhere. It includes tools for data curation, training, retrieval-augmented generation (RAG), and guardrailing, offering enterprises an easy, cost-effective, and efficient way to adopt generative AI.

With Brev’s 1-click deployment integration, you can quickly access a GPU and start customizing generative AI models with NeMo. The required software stack, including NeMo, is set up by Brev’s platform, freeing you to focus on AI development. You can now quickly leverage a fully GPU-accelerated stack, without having to worry about the necessary infrastructure management overhead. 

Step 1: Setup prerequisites

To start, get the notebook from the NGC catalog. Once it’s deployed on Brev, you can access it from your browser and start executing the different code blocks. 

Select the 1-click deploy button on the NGC catalog page. You’ll be directed to fine-tune the notebook on Brev (Figure 1).

Screenshot of the Mistral Jupyter Notebook page on the NGC catalog.
Figure 1. The Mistral Jupyter Notebook on the NGC catalog

Next, click the Deploy Notebook button (Figure 2). If you’re a new user to Brev, you’ll be prompted to create an account. In this case, you’ll need to re-click the Deploy Notebook button.

Screenshot of Mistral Jupyter Notebook page on Brev.
Figure 2. The Mistral Jupyter Notebook on Brev

Wait for the instance to finish setup and configuration. When the JupyterLab is ready on the NVIDIA A100 Tensor Core GPU, click Access Notebook. 

Step 2: Prepare the base model

Next, download Mistral 7B into your instance and convert it to .nemo format. Use the following commands to pull the model from Hugging Face and run the conversion script.

!pip install ipywidgets !jupyter nbextension enable --py widgetsnbextension !mkdir -p models/mistral7b
import huggingface_hub from huggingface_hub import login TOKEN = "" login(TOKEN)
huggingface_hub.snapshot_download(repo_id="mistralai/Mistral-7B-v0.1", 
local_dir="models/mistral7b", local_dir_use_symlinks=False) !python
/opt/NeMo/scripts/nlp_language_modeling/convert_hf_mistral_7b_to_nemo.py 
--in-file=models/mistral7b --out-file=models/mistral7b.nemo

This command uses the nlp_language_modeling subdirectory from NeMo to convert the Mistral model into .nemo format, so that the NeMo framework can be leveraged for fine-tuning.

Step 3: Prepare fine-tuning data

This example fine-tunes Mistral 7B on the PubMedQA dataset. The task of PubMedQA is to answer medical research questions with yes/no/maybe. PubMedQA has 1K expert-labeled, 61.2K unlabeled, and 211.3K artificially generated QA instances. 

The following commands will convert the PubMedQA data into the .jsonl format for parameter-efficient fine-tuning (PEFT) with NeMo. It’s also necessary to reformat the data into prompts that the model can appropriately handle.

!git clone https://github.com/pubmedqa/pubmedqa.git
!cd pubmedqa/preprocess && python split_dataset.py pqal
import json

def write_jsonl(fname, json_objs):
    with open(fname, 'wt') as f:
        for o in json_objs:
            f.write(json.dumps(o)+"\n")

def form_question(obj):
    st = ""
    st += f"QUESTION:{obj['QUESTION']}\n"
    st += "CONTEXT: "
    for i, label in enumerate(obj['LABELS']):
        st += f"{obj['CONTEXTS'][i]}\n"
    st += f"TARGET: the answer to the question given the context is (yes|no|maybe): "
    return st

def convert_to_jsonl(data_path, output_path):
    data = json.load(open(data_path, 'rt'))
    json_objs = []
    for k in data.keys():
        obj = data[k]
        prompt = form_question(obj)
        completion = obj['reasoning_required_pred']
        json_objs.append({"input": prompt, "output": completion})
    write_jsonl(output_path, json_objs)
    return json_objs
test_json_objs = convert_to_jsonl("pubmedqa/data/test_set.json", "pubmedqa_test.jsonl")
train_json_objs = convert_to_jsonl("pubmedqa/data/pqal_fold0/train_set.json", "pubmedqa_train.jsonl")
dev_json_objs = convert_to_jsonl("pubmedqa/data/pqal_fold0/dev_set.json", "pubmedqa_val.jsonl")

The data is now ready. The training and test datasets are set up for the fine-tuning task.

Step 4: Run training

After setting up GPU configurations, precision, maximum steps, and so on, initialize the training pipeline using the NeMo framework. The code below imports the necessary classes and modules, creates a trainer instance, sets up the experiment manager, and loads the pretrained Megatron GPT model. This merges the configuration and restores the model with the provided trainer. The setup will run the actual fine-tuning of Mistral with LoRA and PEFT.

from nemo.collections.nlp.models.language_modeling.megatron_gpt_sft_model import MegatronGPTSFTModel
from nemo.collections.nlp.parts.megatron_trainer_builder import MegatronLMPPTrainerBuilder
from nemo.collections.nlp.parts.peft_config import LoraPEFTConfig
from nemo.utils.exp_manager import exp_manager

trainer = MegatronLMPPTrainerBuilder(cfg).create_trainer()
exp_manager(trainer, cfg.exp_manager)

model_cfg = MegatronGPTSFTModel.merge_cfg_with(cfg.model.restore_from_path, cfg)
model = MegatronGPTSFTModel.restore_from(cfg.model.restore_from_path, model_cfg, trainer=trainer)

model.add_adapter(LoraPEFTConfig(model_cfg))
trainer.fit(model)

Visit NVIDIA/NeMo-Framework-Launcher on GitHub to view documentation for the NVIDIA NeMo framework launcher and tools.

Step 5: View performance and results

Now that the Mistral 7B model has been fine-tuned, check how well it performs against the test dataset using the following command:

trainer.test(model)

The trainer.test(model) line executes the test loop, which evaluates the model’s performance on the test dataset. The output shows the test metrics (including the test loss and the loss on the PubMedQA dataset) as well as the validation loss. As LoRA adapters were used to fine-tuned Mistral, this is a great way to measure how the model is performing post-PEFT.

You’ve now successfully fine-tuned Mistral 7B with NeMo.

By acting as a single interface to all clouds and automating the setup process, Brev empowers you to fully harness the power of NVIDIA software without leaving the NVIDIA ecosystem. Work with the full capabilities of NGC software with fewer constraints to enhance the ease of AI development and deployment across any project scale. 

Get started

Get started with Brev’s 1-click deployment to provision GPU infrastructure—the first 2 hours are free:

In addition, Brev is actively expanding the 1-click deploy feature to more NVIDIA software on the NGC catalog. Explore the Quick Deploy with Brev.dev collection

Discuss (0)

Tags