Trustworthy AI / Cybersecurity

Advancing Security for Large Language Models with NVIDIA GPUs and Edgeless Systems

Jul 02, 2024

By Laura Martinez

Discuss (0)

AI-Generated Summary

Dislike

Edgeless Systems has introduced Continuum AI, a generative AI framework that keeps user prompts encrypted using confidential computing, combining confidential VMs with NVIDIA H100 GPUs and secure sandboxing to protect user data and AI model weights.
Continuum relies on confidential computing and advanced sandboxing to create a secure environment, isolating AI code from the infrastructure and service provider, and preventing potential data leaks through the use of an adapted version of Google's gVisor sandbox.
The framework consists of a server side that hosts the AI service and processes prompts securely, and a client side that verifies the server, encrypts the prompts, and sends inference requests, ensuring that prompts are protected from the entity providing the AI code and the infrastructure.

AI-generated content may summarize information incompletely. Verify important information. Learn more

Edgeless Systems introduced Continuum AI, the first generative AI framework that keeps prompts encrypted at all times with confidential computing by combining confidential VMs with NVIDIA H100 GPUs and secure sandboxing.

The launch of this platform underscores a new era in AI deployment, where the benefits of powerful LLMs can be realized without compromising data privacy and security. Edgeless Systems, a Germany-based cybersecurity company that develops open-source software for confidential computing, is collaborating with NVIDIA to empower businesses across sectors to confidently integrate AI into their operations.

The confidential LLM platform isn’t just a technological advancement—it’s a pivotal step towards a future where organizations can securely utilize AI, even for the most sensitive data.

The Continuum technology has two main security goals. It first protects the user data and also protects AI model weights against the infrastructure, the service provider, and others. Infrastructure includes the basic hardware and software stack that the given AI app runs on. This includes all of the underlying cloud platforms, as well. In the case of ChatGPT, this would be Microsoft Azure. The service provider is the entity that provides and controls the actual AI app. In the case of ChatGPT, this would be OpenAI.

How Continuum works

Continuum relies on two core mechanisms: confidential computing and advanced sandboxing. Confidential computing is a hardware-based technology that keeps data encrypted even during processing. Further, confidential computing makes it possible to verify the integrity of workloads..

Confidential Computing, powered by NVIDIA H100 Tensor Core GPUs and advanced sandboxing technology, enables customers to protect user data and AI models. It does this by creating a secure environment that separates the infrastructure and service provider from the data and models. This technology also includes popular AI inference services, like NVIDIA Triton Inference Server.

Even with these security mechanisms in place, the AI code will likely come from a third party, which could accidentally or maliciously leak prompts, such as writing the prompt to the disk or the network in plaintext.

One solution is to review the AI code thoroughly. However, due to the complexity and regular updates to AI code, this is impractical.

Continuum addresses the problem by running the AI code inside a sandbox on the confidential computing-protected AI worker. In general terms, a sandbox is an environment that prevents an application from interacting with the rest of a system. It runs the AI code inside an adapted version of Google’s gVisor sandbox. This ensures that the AI code has no means to leak prompts and responses in plaintext. The only thing the AI code can do is receive encrypted prompts, query the accelerator, and return encrypted responses.

With this architecture in place, your prompts are even protected from the entity that provides the AI code. In simplified terms, in the case of the well-known ChatGPT, this means that you wouldn’t have to trust OpenAI (the company that provides the AI code) or Microsoft Azure (the company that runs the infrastructure).

Architecture

Continuum consists of two parts: the server side and the client side. The server side hosts the AI service and processes prompts securely. The client-side verifies the server, encrypts the prompts, and sends inference requests. Let’s dive deeper into the components, how they interact, and details on their respective roles.

The server side hosts the inference service. Its architecture includes two main components: the workers and the attestation service.

The worker node is central to the backend. It hosts an AI model and serves inference requests. The necessary inference code and model are provided externally by the inference and model owner. The containerized inference code, called AI code, runs in a secure environment.

Each worker is a confidential VM (CVM) running Continuum OS. This OS is minimal, immutable, and verifiable through remote attestation. Continuum OS hosts workloads in a sandbox and mediates network traffic through an encryption proxy.

The worker provides an HTTPS API to manage (start and stop) AI code containers.

AI code sandbox

The AI code, provided by the inference owner, runs in a gVisor sandbox. This sandbox isolates the AI code from the host, handling system calls in a userspace kernel and blocking network traffic to prevent data leaks.

Encryption proxy

Each AI code has an attached proxy container, which is its only connection to the outside world. The proxy manages prompt encryption on the client side. It decrypts incoming requests and sends them to the sandbox. In the opposite direction, it encrypts responses and sends them back to the user. The proxy supports various API adapters, such as OpenAI or Triton Generate.

Attestation service

The attestation feature of CVMs ensures the integrity and authenticity of workers. This enables both the service provider and clients to verify the workers’s integrity and that they are interacting with a benign deployment.

The attestation service (AS) is centrally managed. On the server side, the AS verifies each worker based on its attestation statement. On the client side, the AS provides a system-wide attestation endpoint and handles key exchanges for prompt encryption.

The AS runs in a Confidential Virtual Machine (CVM). During initialization, the service provider uses the Continuum CLI to establish trust by verifying the AS attestation report.

Workflow

In Figure 2, the flow details how admins verify the attestation services integrity through the CLI. Upon successful verification, the admin sets the manifest using the CLI. Interacting directly with the workers, they configure the AI code using the worker API.

Workers register with the AS, which verifies their attestation reports. Verified workers receive inference secrets and can then serve inference requests.

Users interact directly with the AS and the workers or through a trusted web service. Users verify the deployment using the AS and set their inference secrets. Then they can send encrypted prompts to the service. The encryption proxy decrypts these prompts, forwards them to the sandbox, re-encrypts the responses, and sends them back to the user.

For more details, check out Continuum to stay ahead in the realm of enterprise-grade confidential AI.

Discuss (0)

About the Authors

About Laura Martinez
Laura Martinez is the director of marketing strategy at NVIDIA. She is passionate about Confidential Computing and recently joined NVIDIA to help customers revolutionize their business with GPU performant security. Laura currently sits on the CC Consortium Outreach committee and previously served as the director of data center security marketing where she worked on the industry’s first Confidential Computing solution in addition to playing a pivotal role in launching the Intel attestation service last year. Prior to joining Intel in 2017, Laura worked at UC Davis Health to advance health outcomes through AI. When she was at Trend Micro, she was the founding member of the beta department growing the customer base from zero to more than 100K in the first year. While continuing to manage that department, she channeled her love for children and technology to create Trend Micro’s first parental control solution to protect kids while online.

View all posts by Laura Martinez

Advancing Security for Large Language Models with NVIDIA GPUs and Edgeless Systems

How Continuum works​

Architecture

Tags

About the Authors

Comments

How Continuum works