Getting Started with Edge AI on NVIDIA Jetson: LLMs, VLMs, and Foundation Models for Robotics

Running advanced AI and computer vision workloads on small, power-efficient devices at the edge is a growing challenge. Robots, smart cameras, and autonomous machines need real-time intelligence to see, understand, and react without depending on the cloud. The NVIDIA Jetson platform meets this need with compact, GPU-accelerated modules and developer kits purpose-built for edge AI and robotics.

The tutorials below show how to bring the latest open source AI models to life on NVIDIA Jetson, running completely standalone and ready to deploy anywhere. Once you have the basics, you can move quickly from simple demos to building anything from a private coding assistant to a fully autonomous robot.

Tutorial 1: Your Personal AI Assistant – Local LLMs and Vision Models

A great way to get familiar with edge AI is to run an LLM or VLM locally. Running models on your own hardware provides two key advantages: complete privacy and zero network latency.

When you rely on external APIs, your data leaves your control. On Jetson, your prompts—whether personal notes, proprietary code, or camera feeds—never leave the device, ensuring you retain complete ownership of your information. This local execution also eliminates network bottlenecks, making interactions feel instantaneous.

The open source community has made this incredibly accessible, and the Jetson you choose defines the size of the assistant you can run:

NVIDIA Jetson Orin Nano Super Developer Kit (8GB): Great for fast, specialized AI assistance. You can deploy high-speed SLMs like Llama 3.2 3B or Phi-3. These models are incredibly efficient, and the community frequently releases new fine-tunes on Hugging Face optimized for specific tasks—from coding to creative writing—that run blazingly fast within the 8GB memory footprint.
NVIDIA Jetson AGX Orin (64GB): Provides the high memory capacity and advanced AI compute needed to run larger, more complex models such as gpt-oss-20b or quantized Llama 3.1 70B for deep reasoning.
NVIDIA Jetson AGX Thor (128GB): Delivers frontier-level performance, enabling you to run massive 100B+ parameter models and bring data center-class intelligence to the edge.

If you have an AGX Orin, you can spin up a gpt-oss-20b instance immediately using vLLM as the inference engine and Open WebUI as a beautiful friendly UI.

docker run --rm -it \
  --network host \
  --shm-size=16g \
  --ulimit memlock=-1 \
  --ulimit stack=67108864 \
  --runtime=nvidia \
  --name=vllm \
  -v $HOME/data/models/huggingface:/root/.cache/huggingface \
  -v $HOME/data/vllm_cache:/root/.cache/vllm \
  ghcr.io/nvidia-ai-iot/vllm:latest-jetson-orin

vllm serve openai/gpt-oss-20b

Run the Open WebUI in a separate terminal:

docker run -d \
  --network=host \
  -v ${HOME}/open-webui:/app/backend/data \
  -e OPENAI_API_BASE_URL=http://0.0.0.0:8000/v1 \
  --name open-webui \
  ghcr.io/open-webui/open-webui:main

Then, visit this http://localhost:8080 on your browser.

From here, you can interact with the LLM and add tools that provide agentic capabilities, such as search, data analysis, and voice output (TTS).

Demonstration of gpt-oss-20b Inference on Jetson AGX Orin — *Figure 1. Demonstration of gpt-oss-20b inference on NVIDIA Jetson AGX Orin using vLLM, achieving 40 tokens/sec generation speed via Open WebUI.*

However, text alone isn’t enough to build agents that interact with the physical world; they also need multimodal perception. VLMs such as VILA and Qwen2.5-VL are becoming a common way to add this capability because they can reason about entire scenes rather than only detect objects. For example, given a live video feed, they can answer questions such as “Is the 3D print failing?” or “Describe the traffic pattern outside.”

On Jetson Orin Nano Super, you can run efficient VLMs such as VILA-2.7B for basic monitoring and simple visual queries. For higher-resolution analysis, multiple camera streams, or scenarios with several agents running concurrently, Jetson AGX Orin provides the additional memory and compute headroom needed to scale these workloads.

To test this out, you can launch the Live VLM WebUI from the Jetson AI Lab. It connects to your laptop’s camera via WebRTC and provides a sandbox that streams live video to AI models for instant analysis and description.

The Live VLM WebUI supports Ollama, vLLM, and most inference engines that expose an OpenAI-compatible server.

To get started with VLM WebUI using Ollama, follow the steps below:

# Install ollama (skip if already installed)
curl -fsSL https://ollama.com/install.sh | sh

# Pull a small VLM-compatible model
ollama pull gemma3:4b 

# Clone and start Live VLM WebUI
git clone https://github.com/nvidia-ai-iot/live-vlm-webui.git
cd live-vlm-webui
./scripts/start_container.sh

Next, open https://localhost:8090 in your browser to try it out.

This setup provides a strong starting point for building smart security systems, wildlife monitors, or visual assistants.

GIF of Interactive Vision Language Model Inference using Live VLM WebUI — *Figure 2. Interactive VLM inference using the Live VLM WebUI on NVIDIA Jetson*.

What VLMs Can You Run?

Jetson Orin Nano 8GB is suitable for VLMs and LLMs up to nearly 4B parameters, such as Qwen2.5-VL-3B, VILA 1.5–3B, or Gemma-3/4B. Jetson AGX Orin 64GB targets medium models in the 4B–20B range and can run VLMs like LLaVA-13B, Qwen2.5-VL-7B, or Phi-3.5-Vision. Jetson AGX Thor 128GB is designed for the largest workloads, supporting multiple concurrent models or single models from about 20B up to around 120B parameters—for example, Llama 3.2 Vision 70B or 120B-class models.

Want to go deeper? Vision Search and Summarization (VSS) enables you to build intelligent archival systems. You can search videos by content rather than filenames and automatically generate summaries of long recordings. It’s a natural extension of the VLM workflow for anyone looking to organize and interpret large volumes of visual data.

Tutorial 2: Robotics with Foundation Models

Robotics is undergoing a fundamental architectural shift. For decades, robot control relied on rigid, hard-coded logic and separate perception pipelines: detect an object, calculate a trajectory, execute a motion. This approach requires extensive manual tuning and explicit coding for every edge case, making it difficult to automate at scale.

The industry is now moving toward end-to-end imitation learning. Instead of programming explicit rules, we’re using foundation models like NVIDIA Isaac GR00T N1 to learn policies directly from demonstration. These are Vision-Language-Action (VLA) models that fundamentally change the input-output relationship of robot control. In this architecture, the model ingests a continuous stream of visual data from the robot’s cameras along with your natural language commands (e.g., “Open the drawer”). It processes this multimodal context to directly predict the necessary joint positions or motor velocities for the next timestep.

However, training these models presents a significant challenge: the data bottleneck. Unlike language models that train on the internet’s text, robots require physical interaction data, which is expensive and slow to acquire. The solution lies in simulation. By using NVIDIA Isaac Sim, you can generate synthetic training data and validate policies in a physics-accurate virtual environment. You can even perform hardware-in-the-loop (HIL) testing, where the Jetson runs the control policy while connected to the simulator powered by an NVIDIA RTX GPU. This allows you to validate your entire end-to-end system, from perception to actuation, before you invest in physical hardware or attempt a deployment.

Once validated, the workflow transitions seamlessly to the real world. You can deploy the optimized policy to the edge, where optimizations such as TensorRT enable heavy transformer-based policies to run with the low latency (sub-30 ms) required for real-time control loops. Whether you’re building a simple manipulator or exploring humanoid form factors, this paradigm—learning behaviors in simulation and deploying them to the physical edge—is now the standard for modern robotics development.

You can begin experimenting with these workflows today. The Isaac Lab Evaluation Tasks repo on GitHub provides pre-built industrial manipulation benchmarks, such as nut pouring and exhaust pipe sorting, that you can use to test policies in simulation before deploying to hardware. Once validated, the GR00T Jetson deployment guide walks you through the process of converting and running these policies on Jetson with optimized TensorRT inference. For those looking to post-train or fine-tune GR00T models on custom tasks, the LeRobot integration enables you to leverage community datasets and tools for imitation learning, bridging the gap between data collection and deployment

Join the Community: The robotics ecosystem is vibrant and growing. From open-source robot designs to shared learning resources, you’re not alone in this journey. Forums, GitHub repositories, and community showcases offer both inspiration and practical guidance. Join the LeRobot Discord community to connect with others building the future of robotics.

Yes, building a physical robot takes work: mechanical design, assembly, and integration with existing platforms. But the intelligence layer is different. That is what Jetson delivers: real time, powerful, and ready to deploy.

Which Jetson is Right for You?

Use Jetson Orin Nano Super (8GB) if you’re just getting started with local AI, running small LLMs or VLMs, or building early-stage robotics and edge prototypes. It’s especially well-suited for hobbyist robotics and embedded projects where cost, simplicity, and compact size matter more than maximum model capacity.

Choose Jetson AGX Orin (64GB) if you’re a hobbyist or independent developer looking to run a capable local assistant, experiment with agent-style workflows, or build deployable personal pipelines. The 64GB of memory makes it far easier to combine vision, language, and speech (ASR and TTS) models on a single device without constantly running into memory limits.

Go to Jetson AGX Thor (128GB) if your use case involves very large models, multiple concurrent models, or strict real-time requirements at the edge.

Next Steps: Getting Started

Ready to dive in? Here’s how to begin:

Choose your Jetson: Based on your ambitions and budget, select the developer kit that best fits your needs.
Flash and setup: Our Getting Started Guides make setup straightforward and you’ll be up and running in under an hour.
- Jetson Orin Nano Developer Kit: Getting Started Guide
- Jetson AGX Orin Developer Kit: Getting Started Guide
- Jetson AGX Thor Developer Kit: Getting Started Guide
Explore the resources:
- Jetson AI Lab: Comprehensive tutorials with pointer to pre-built containers (Open WebUI, Live VLM WebUI, and more). Test your first models.
- Community Forums: Connect with other developers, share projects, get support.
Start building: Pick a project, dive into the tutorial project on GitHub, see what’s possible and then push further.