Run Autonomous, Self-Evolving Agents More Safely with NVIDIA OpenShell

AI has evolved from assistants following your directions to agents that act independently. Called claws, these agents can take a goal, figure out how to achieve it, and execute indefinitely—while leaving you out of the loop. The more capable claws become, the harder they are to trust. And their self-evolving autonomy changes everything about the environment in which they operate.

The infrastructure to run claws more safely didn’t exist, until now.

NVIDIA at GTC announced NemoClaw, an open source stack that simplifies running OpenClaw always-on assistants—with a single command. It incorporates policy-based privacy and security guardrails, giving you control over your agents’ behavior and data handling. This enables self-evolving claws to run more safely in the cloud, on prem, on NVIDIA RTX PCs, and on NVIDIA DGX Spark.

NVIDIA NemoClaw uses open source models—like NVIDIA Nemotron—alongside the NVIDIA OpenShell runtime, which is part of the NVIDIA Agent Toolkit. By combining powerful open source models with built-in safety measures, NemoClaw simplifies and secures AI agent deployment.

The NVIDIA Agent Toolkit, meanwhile, provides the full deployment stack—models, tools, evaluation, and runtimes—for building, testing, and optimizing long-running agents that can plan tasks; work across applications and enterprise data; and operate as dependable, production-ready services.

Based on Apache 2.0, OpenShell sits between your agent and your infrastructure. It governs how the agent executes, what the agent can see and do, and where inference goes. OpenShell enables claws to run in isolated sandboxes, giving you fine-grained control over your privacy and security while letting you benefit from the agents’ productivity.

Run one command: openshell sandbox create --remote spark --from openclaw, and make zero code changes. Then any claw or coding agent like OpenClaw, Anthropic’s Claude Code, or OpenAI’s Codex can run unmodified inside OpenShell.

This blog will discuss the evolution of AI agents and detail how OpenShell works.

How claws introduce risk

Claws remember context across sessions, spawn subagents to act independently, write their own code to learn new skills mid-task, use tools, and keep executing long after you close your laptop. For the first time, an individual developer can spin up an agent that does the work of a team, running continuously and handling complexity that would have required coordination, pipelines, and weeks of time.

Long-running agents like OpenClaw have shown productivity gains but also pose security risks. Today’s agent runtimes resemble the early days of the web. They’re powerful but missing core security primitives: sandboxing, permissions, and isolation.

For long-running, self-evolving agents to actually work, you need three things simultaneously: safety, capability, and autonomy. You can only reliably get two at a time with existing approaches. If safe and autonomous but without access to the tools and data it needs, the agent can’t finish the job. If capable and safe but gated on constant approvals, then you’re babysitting it. If capable and autonomous with full access, you’ve got a long-running process policing itself—guardrails living inside the same process they’re supposed to be guarding.

That last one is the critical failure mode. A stateless chatbot has no meaningful attack surface. An agent with persistent shell access, live credentials, the ability to rewrite its own tooling, and six hours of accumulated context running against your internal APIs is a fundamentally different threat model. Every prompt injection is a potential credential leak. Every third-party skill a claw installs is an unreviewed binary with filesystem access. Every subagent it spawns can inherit permissions it was never meant to have.

The agents are ready. The environment you need to actually trust them has been missing.

How NVIDIA built OpenShell

The core architectural decision behind OpenShell is out-of-process policy enforcement. Instead of relying on behavioral prompts, it enforces constraints on the environment the agent runs in—meaning the agent cannot override them, even if compromised. This is the browser tab model applied to agents: Sessions are isolated, and permissions are verified by the runtime before any action executes.

Tools like Claude Code and Cursor ship with valuable internal guardrails and system prompts, but those protections live inside the agent. OpenShell wraps those harnesses, moving the ultimate control point entirely outside the agent’s reach.

The runtime will rely on many pieces, but here are some NVIDIA is delivering today:

The sandbox is designed specifically for long-running, self-evolving agents. It is not generic container isolation. It handles skill development and verification, programmable system and network isolation, and isolated execution environments that agents can break without touching the host. Policy updates happen live at sandbox scope as developer approvals are granted, with a full audit trail of every allow and deny decision.
The policy engine enforces constraints on the agent’s environment across the filesystem, network, and process layers. Self-evolving agents require granular oversight to trust them when they’re installing packages, learning skills at runtime, and spawning scoped subagents. By evaluating every action at the binary, destination, method, and path level, the engine ensures an agent can install a verified skill but cannot execute an unreviewed binary. The agent gets the autonomy it needs to evolve within the boundaries you define. If an agent hits a constraint, it can reason about the roadblock and propose a policy update, leaving you with the final approval.
The privacy router keeps sensitive context on-device with local open models and routes to frontier models like Claude and GPT only when policy allows. The router makes decisions based on your cost and privacy policy, not the agent’s. OpenShell is model-agnostic by design and provides the environment where all agents and their harnesses can be governed.

How OpenShell enables the next generation of claws

OpenShell is designed to scale from a single developer on an NVIDIA DGX Spark or NVIDIA stack to enterprise-wide deployments, using the same primitives at every level. That includes deny-by-default, live policy updates, and a full audit trail whether you’re one developer or running an enterprise GPU cluster.

The adoption and use of Claws is only accelerating, and the infrastructure decisions made in the next six to 12 months will shape what enterprise agent deployment looks like for a long time.

Agents built with OpenShell can continuously build new skills over time using popular coding agents like Claude Code, Codex, Cursor, and OpenCode—and you can add tools, models, and behaviors through the sandbox interface while keeping every new capability subject to the same policy and privacy controls.

Get started with OpenShell today by visiting the NVIDIA GitHub repo and running it on your NVIDIA DGX Spark, NVIDIA DGX Station, or a dedicated PC with an NVIDIA RTX GPU.