NeMo Curator
Jan 13, 2025
Enhancing Generative AI Model Accuracy with NVIDIA NeMo Curator
In the rapidly evolving landscape of artificial intelligence, the quality of the data used for training models is paramount. High-quality data ensures that...
5 MIN READ
Jan 09, 2025
Announcing Nemotron-CC: A Trillion-Token English Language Dataset for LLM Pretraining
NVIDIA is excited to announce the release of Nemotron-CC, a 6.3-trillion-token English language Common Crawl dataset for pretraining highly accurate large...
4 MIN READ
Jan 09, 2025
Advancing Physical AI with NVIDIA Cosmos World Foundation Model Platform
As robotics and autonomous vehicles advance, accelerating development of physical AI—which enables autonomous machines to perceive, understand, and perform...
14 MIN READ
Dec 19, 2024
Enhance Your Training Data with New NVIDIA NeMo Curator Classifier Models
Classifier models are specialized in categorizing data into predefined groups or classes, playing a crucial role in optimizing data processing pipelines for...
11 MIN READ
Nov 19, 2024
Processing High-Quality Vietnamese Language Data with NVIDIA NeMo Curator
Open-source large language models (LLMs) excel in English but struggle with other languages, especially the languages of Southeast Asia. This is primarily due...
17 MIN READ
Nov 13, 2024
Mastering LLM Techniques: Data Preprocessing
The advent of large language models (LLMs) marks a significant shift in how industries leverage AI to enhance operations and services. By automating routine...
14 MIN READ
Nov 06, 2024
State-of-the-Art Multimodal Generative AI Model Development with NVIDIA NeMo
Generative AI has rapidly evolved from text-based models to multimodal capabilities. These models perform tasks like image captioning and visual question...
6 MIN READ
Oct 28, 2024
Upcoming Webinar: Enhance Generative AI Model Accuracy Through High-Quality Data Processing
Learn how to build scalable data processing pipelines to create high-quality datasets.
1 MIN READ
Oct 15, 2024
Train Highly Accurate LLMs with the Zyda-2 Open 5T-Token Dataset Processed with NVIDIA NeMo Curator
Open-source datasets have significantly democratized access to high-quality data, lowering the barriers of entry for developers and researchers to train...
5 MIN READ
Oct 15, 2024
DataStax Announces New AI Development Platform, Built with NVIDIA AI
As enterprises increasingly adopt AI technologies, they face a complex challenge of efficiently developing, securing, and continuously improving AI applications...
6 MIN READ
Oct 04, 2024
Just Released: NVIDIA NeMo Curator Improvements for Accelerating Data Curation
NeMo Curator now supports images, enabling you to process data for training accurate generative AI models.
1 MIN READ
Sep 10, 2024
Streamlining Data Processing for Domain Adaptive Pretraining with NVIDIA NeMo Curator
Domain-adaptive pretraining (DAPT) of large language models (LLMs) is an important step towards building domain-specific models. These models demonstrate...
16 MIN READ
Jul 31, 2024
Curating Custom Datasets for LLM Parameter-Efficient Fine-Tuning with NVIDIA NeMo Curator
In a recent post, we discussed how to use NVIDIA NeMo Curator to curate custom datasets for pretraining or continuous training use cases of large language...
11 MIN READ
Jul 23, 2024
Supercharging Llama 3.1 across NVIDIA Platforms
Meta's Llama collection of large language models are the most popular foundation models in the open-source community today, supporting a variety of use cases....
8 MIN READ
Jul 23, 2024
Customize Generative AI Models for Enterprise Applications with Llama 3.1
The newly unveiled Llama 3.1 collection of 8B, 70B, and 405B large language models (LLMs) is narrowing the gap between proprietary and open-source models. Their...
10 MIN READ
Jul 10, 2024
Curating Non-English Datasets for LLM Training with NVIDIA NeMo Curator
Data curation plays a crucial role in the development of effective and fair large language models (LLMs). High-quality, diverse training data directly...
12 MIN READ