Trustworthy AI / Cybersecurity

How to Automate AI Model Documentation with the NVIDIA MCG Toolkit

May 29, 2026

By Pratyusha Maiti and Michael Boone

Discuss (0)

AI-Generated Summary

Dislike

NVIDIA's Model Card Generator (MCG) toolkit automates and standardizes the creation of comprehensive AI model documentation in Model Card++ format, improving transparency and regulatory compliance by extracting information directly from source code and associated files.
The MCG pipeline operates in three stagesIngestion, Extraction, and Renderingleveraging NVIDIA Inference Microservices and GPT-OSS-120B for high-precision data retrieval, content generation, and formatting, producing overview and four subcards covering Bias, Explainability, Privacy, and Safety & Security.
Customization options include configurable language models, templates, and field-level guides, allowing organizations to adapt the toolkit for different compliance needs and industry standards without altering core extraction logic.
Performance testing shows the toolkit generates model cards quickly (under a minute for most repositories) with a completion rate of 91% and accuracy of 76%, though results depend heavily on the availability and quality of supporting documentation.
When documentation is sparse or absent, the toolkit flags missing information instead of guessing, serving as both a documentation generator and a gap-finder to assist teams in maintaining up-to-date, auditable model records.
Oracle is a key early adopter, integrating the MCG toolkit into its OCI AI infrastructure to enhance model documentation and GPU resource optimization within dedicated AI clusters and cloud environments.

AI-generated content may summarize information incompletely. Verify important information. Learn more

As AI models grow in complexity and regulatory scrutiny intensifies under frameworks including California’s AB-2013 and the EU AI Act, software teams face a challenge beyond delivering great code: They need to produce comprehensive, auditable model documentation before the models are released.

Model cards describe how a model works, its intended use and license, training data, performance, and limitations. They promote transparency and accountability so downstream users—customers, regulators, and affected communities—can make informed decisions when selecting and deploying AI. That audience extends beyond developers: Policymakers, procurement teams, and risk assessors rely on model cards to evaluate fitness for use and compare models across vendors.

In practice, creating model cards manually is tedious and slow. Documentation lags behind development, and metadata is often outdated by ship date. As models grow more complex, inconsistent formatting and missing required fields create unnecessary audit risk and slow adoption. The NVIDIA model card generator (MCG) toolkit automates and standardizes model documentation in Model Card++ format in under a minute, by reading directly from source data.

Introducing the NVIDIA MCG toolkit

The MCG toolkit is a containerized pipeline that automates the generation of model cards by reading in the model source code. It follows a modular Ingestion → Extraction → Rendering pipeline. A central orchestrator receives your request—either a URL or an uploaded file—coordinates the workflow, and returns a complete model card. Each stage runs as a separate service, so you can update or swap individual components without affecting the rest of the pipeline.

How the MCG toolkit works

The toolkit exposes an interactive UI that accepts a URL (GitHub, GitLab, HuggingFace, or any public web page) or an uploaded file (ZIP, PDF, DOCX, or Markdown). A REST API is also available for programmatic integration.

From there, data flows through three stages:

Input → Ingestion. The system fetches the content and processes it into document chunks, categorized by type: documentation, config files, and code.
Documents → Extraction. The extraction stage runs ingested documents through a retrieval-augmented generation (RAG) pipeline powered by NVIDIA Inference Microservices (NIM). NVIDIA Nemotron RAG handles high-precision embedding (llama-nemotron-embed-1b-v2) and reranking (llama-nemotron-rerank-500m-v2), with separate retrievers for code, config files, and documentation to prioritize higher-signal sources. The core extraction is performed by GPT-OSS-120B, which reads the retrieved passages and applies expert-curated formatting and content guides—the NVIDIA MC++ template and field-level style guides—to generate compliant information in the expected format. A validation step checks responses before they are accepted. Output is structured JSON. After the overview is complete, the same content flows to a subcards stage, which produces the four Model Card++ subcards: Bias, Explainability, Privacy, and Safety & Security.
JSON → Rendering. The structured JSON renders into human-readable Markdown using a configurable template. You can edit the content in the interface and re-render before downloading or integrating with other systems. The final artifact is a complete model card – overview plus four subcards – ready for review or publication.

Designed for flexibility

You’re not locked into one model, template, or standard. The toolkit is customizable across three dimensions:

1) Models: The system uses configurable endpoints for the language model, embeddings, and reranking. Point to different NIMs or compatible APIs to match your performance, cost, or data residency requirements, whether you’re prototyping on a smaller model or scaling up for production.

2) Templates: The output format is driven by a Markdown template. Organizations can customize it for Model Card++, internal standards, or emerging regulatory formats without modifying the extraction logic. Outputs are also CycloneDX-compliant. When a new disclosure requirement appears, you update the template rather than the pipeline.

3) Guides: Field-level guidance—what to capture, how to phrase it—comes from configurable knowledge bases. As regulations or domain needs evolve, update the guides without touching the core code. The same pipeline can serve different industries and compliance regimes.

Run it where you need it

The toolkit ships as containerized services with a one-command setup. The orchestrator, ingestion, extraction, and subcards stages each run as separate containers, with infrastructure (database and task queue) included. There’s no proprietary cloud lock-in: MCG runs on-premises or in your own cloud, with Kubernetes support to help you spin up on your own infrastructure.

Performance results

We ran the toolkit through standardized testing on public model repositories to measure completion rate, generation time, and accuracy. Each field was scored against the source documentation. Accuracy is calculated as correct fields over non-placeholder fields. Table 1, below, shows the results.

Model	Time to Generate	Completion Rate	Accuracy
NVIDIA Nemotron Nano 8B	56s	97%	92%
NVIDIA Cosmos Reason 2	86s	94%	82%
NVIDIA Parakeet	65s	92%	87%
NVIDIA Proteina	52s	94%	82%
Third-party models(DeepSeek-V3, Evo2, Gemma, Llama)	~80s avg	~89%	~80%

Table 1. Performance on MC++ Overview across standardized test models. Completion rate = fields with meaningful content / total fields. Accuracy = correct / total non-placeholder responses.

The toolkit generates a full model card (overview plus four subcards) in under a minute for most repositories. Overall completion reaches 91% (third-party baseline), with accuracy at 76% across the standardized test set. Completion and accuracy vary by model and repository; repositories with richer READMEs and config files yield higher results.

The toolkit performs best when supporting documentation exists and the codebase is well-structured, using code analysis to supplement where possible. When documentation is sparse or absent, fewer fields are populated and rather than guessing, the system surfaces “not found” or “information not available” to flag gaps for human review.

We also tested what happens when documentation is removed entirely. Using the same repositories from our standard test set, we stripped all .pdf, .md, and .txt files and re-ran the toolkit against code alone. Across five models, average completion rate dropped to 61% from 91%, and strict accuracy, measured only over verifiable fields, fell to 28%, compared with 76% in the standard test that scores accuracy over completed fields only.

The 61% completion shows the toolkit still extracts meaningful signals from code, config files, and repository structure alone; the accuracy drop reflects how much documentation contributes to getting those fields right.

Critically, the toolkit doesn’t compensate by guessing. If it cannot confidently populate fields, they are surfaced as “not found” or “information not available,” making it a useful gap-finder for teams whose documentation is still being written, as well as a generator for teams whose documentation is complete.

Early adopters and industry partners

Oracle is among our first partners to integrate the MCG Toolkit into production infrastructure. As part of their OCI AI offering, which spans GPU configurations from the A10 to the GB200 NVL72, Oracle deployed the toolkit combination of OCI container engine for Kubernetes and AI offerings, running MCG pods and NIM pods within a standard VCN architecture backed by Object Storage for the NIM models.

Their deployment uses Llama-3.3-Nemotron-Super-49B-v1 as the core extraction model, with Nemotron RAG handling embedding and reranking. GPT-OSS-120B model was hosted and tested on both the dedicated AI cluster with 2xH100 cards as well as the on-demand offering of the model. OCI supports increasingly powerful GPU infrastructure for large-scale AI training and inference, the need for consistent, auditable model documentation grows alongside it.

An OCI Dedicated AI Cluster (DAC) is a private, fully managed generative AI environment with its own dedicated GPUs, endpoints, and security boundary inside OCI. The MCG toolkit brings not only AI transparency tooling directly into that workflow without requiring customers to build it themselves but also the ability for customers to identify the optimal GPU configuration that is needed for hosting the models both in the OCI Dedicated AI cluster environments and baremetal GPU infrastructure.

Getting started

If you’d like to be an early adopter, reach out to the Trustworthy AI team. We’re happy to discuss partnerships.

Not ready for the fully automated toolkit? The Trustworthy AI GitHub repository has open source Model Card++ templates and AI transparency cards for blueprints, datasets, containers, and systems you can use today.

Documentation should keep pace with the models you ship. Whether you adopt the MCG toolkit or start with our open source templates, NVIDIA’s Trustworthy AI initiative is committed to making that easier.

Discuss (0)

About the Authors

About Pratyusha Maiti
Pratyusha is the software engineer for trustworthy AI products at NVIDIA, where she is responsible for the development of tools advancing transparency and documentation in AI systems. Prior to NVIDIA, she was a research scientist at Georgia Institute of Technology, where she led the development of AI-powered virtual teaching assistants for online classrooms. She combines her expertise in language modeling and AI evaluation with a commitment to developing trustworthy AI tools that prioritize transparency, safety and responsible deployment.

View all posts by Pratyusha Maiti

About Michael Boone
Michael Boone is the Manager for Trustworthy AI Product at NVIDIA. He is responsible for building NVIDIA’s technology according to its guiding principles—driving the implementation of products, tools, and processes that enable the company, its customers, and the larger ecosystem to deploy AI with confidence. Beginning his career as a licensed civil engineer, Michael pivoted from transportation infrastructure project management and operations to owning NVIDIA’s global core computer vision product marketing strategy, as well as product feature definition for DRIVE AV. Michael brings a safety-first engineering mindset to the AI frontier, drawing on his background in physical infrastructure to ensure digital systems are built with the same principle and rigor. An inventor and car enthusiast, Michael is a highly trusted collaborator and a leading voice in the deployment of emerging technology across public, private, and research environments.

View all posts by Michael Boone