Lightweight, Multimodal, Multilingual Gemma 3 Models Are Streamlined for Performance

Building AI systems with foundation models requires a delicate balancing of resources such as memory, latency, storage, compute, and more. One size does not fit all for developers managing cost and user experience when bringing generative AI capability to the rapidly growing ecosystem of AI-powered applications.

You need options for high-quality, customizable models that can support large-scale services hosted and deployed across different computing environments, from data centers to edge computing and on-device use cases.

Google DeepMind just announced Gemma 3, a new range of multimodal and multilingual open models. Gemma 3 consists of a 1B text-only small language model (SLM) and three image-text models in sizes 4B, 12B, and 27B. You can use the models from HuggingFace and demo the 1B model in the NVIDIA API Catalog.

The Gemma 3 1B model is optimized to run efficiently on device applications or environments requiring low memory usage with inputs up to 32K tokens. The Gemma 3 4B, 12B, and 27B models accept text, image, and multi-image inputs up to 128K tokens.

Experiment and prototype with optimized Gemma 3 models

Explore this model in the NVIDIA API Catalog where you can experiment with your own data and configure parameters such as max tokens and sampling values for temperature and top P.

The preview also generates the code you’d need in Python, NodeJS, and Bash to integrate the model into your program or workflow. If you’re using LangChain for building agents, connecting external data, or chaining actions, you can use the reusable client generated using the NVIDIA LangChain library.

To get started in your own environment, follow these steps:

Create a free account with the NVIDIA API Catalog .
Navigate to the Gemma 3 model card.
Choose Build with this NIM and Generate API Key.
Save the generated key as NVIDIA_API_KEY.

Next-level AI for next-gen robotics and edge solutions

Each Gemma 3 model can be deployed to the NVIDIA Jetson family of embedded computing boards used for robotics and edge AI applications. The smaller variants, 1B and 4B, can be used on a device as small as the Jetson Nano. The 27B model built for high-demand applications can be served on the Jetson AGX Orin, which supports up to 275 TOPS. For more information, see the latest Jetson Orin Nano Developer Kit announcement.

Ongoing collaboration of NVIDIA and Google

Google DeepMind and NVIDIA have collaborated on each release of Gemma. NVIDIA has played a key role in optimizing models for GPUs, contributing to JAX, the Python machine learning library, Google’s XLA compiler, OpenXLA, and many more.

Advancing community models and collaboration

NVIDIA is an active contributor to the open-source ecosystem and has released several hundred projects under open-source licenses.

NVIDIA is committed to open models such as Gemma that promote AI transparency and let users broadly share work in AI safety and resilience. Using the NVIDIA NeMo platform, these open models can be customized and tuned on proprietary data for AI workflows across any industry.