Simplifying Access to Large Language Models with the NVIDIA NeMo Framework and Services

Recent advances in large language models (LLMs) have fueled state-of-the-art performance for NLP applications such as virtual scribes in healthcare, interactive virtual assistants, and many more.

To simplify access to LLMs, NVIDIA has announced two services: NeMo LLM for customizing and using LLMs, and BioNeMo, which expands scientific applications of LLMs for the pharmaceutical and biotechnology industries. The NVIDIA NeMo framework, an end-to-end framework for training and deploying LLMs, is now available to developers around the world in open beta.

NeMo LLM service

The NVIDIA NeMo LLM service provides the fastest path to customize foundation LLMs and deploy them at scale leveraging the NVIDIA managed cloud API or through private and public clouds.

NVIDIA and community-built foundation models can be customized using prompt learning capabilities, which are compute-efficient techniques, embedding context in user queries to enable greater accuracy in specific use cases. These techniques require just a few hundred samples to achieve high accuracy. Now, the promise of LLMs serving several use cases with a single model is realized.

Developers can build applications ranging from text summarization, to paraphrasing, to story generation, and many others, for specific domains and use cases. Minimal compute and technical expertise are required.

The Megatron 530B model is one of the world’s largest LLMs, with 530 billion parameters based on the GPT-3 architecture. It will soon be available to developers through the early access program on the NVIDIA NeMo LLM service. Model checkpoints will soon be available through HuggingFace and NGC, or for use through the service, including:

T5: 3B
NV GPT-3: 5B/20B/530B

Apply now to use NeMo LLM in early access.

Join us for the GTC 2022 session, Enabling Fast-Path to Large Language Model Based AI Applications to learn more.

BioNeMo service

The BioNeMo service, built on the NeMo framework, is a unified cloud environment for AI-based drug discovery workflows. Chemists, biologists, and AI drug discovery researchers can generate novel therapeutics; understand their properties, structure, and function; and ultimately predict binding to a drug target.

Today, the BioNeMo service supports state-of-the-art transformer-based models for both chemistry and proteomics. Support for DNA-based workflows is coming soon. The ESM-1 architecture provides equivalent capabilities for proteins, and OpenFold is supported for ease of use and scaling of workflows for predictions of protein structures. The platform enables an end-to-end modular drug discovery workflow to accelerate research and better understand proteins, genes, and other molecules.

Learn more about NVIDIA BioNeMo.

NeMo framework

NVIDIA has announced new updates to the NVIDIA NeMo framework, an end-to-end framework for training and deploying LLM up to trillions of parameters. The NeMo framework is now available to developers in open beta, on several cloud platforms including Microsoft Azure, Amazon Web Services, and Oracle Cloud Infrastructure, as well as NVIDIA DGX SuperPODs and NVIDIA DGX Foundry.

The NeMo framework is available as a containerized framework on NGC, offering an easy, effective, and cost-efficient path to build and deploy LLMs. It consists of an end-to-end workflow for automated distributed data processing; training large-scale customized GPT-3, T5, and multilingual T5 (mT5) models; and deploying models for inference at scale.

Its hyperparameter tool enables custom model development, automatically searching for the best hyperparameter configurations for both training and inference, on any given distributed GPU cluster configuration.

Large-scale models are made practical, delivering high training efficiency, using techniques such as tensor, data, pipeline parallelism, and sequence parallelism, alongside selective activation recomputing. It is also equipped with prompt learning techniques that enable customization for different datasets with minimal data, vastly improving performance and few-shot tasks.

Apply now to use the NeMo framework in open beta.

Join us for the GTC 2022 session, Efficient At-Scale Training and Deployment of Large Language Models (GPT-3 and T5) to learn more about the latest advancements.