Technical Walkthrough

The Future of Edge AI is Cloud-Native

Discuss (0)

Inference has emerged as THE killer app for edge computing due to its flexibility. Today, inference at the edge (also called edge AI) solves problems across every industry: preventing theft, detecting illness, and reducing herbicide use in farms. But for many, the complexity associated with managing distributed edge servers can erode the business value.

An edge AI data center does not have 10,000 servers in one location. It has one or more servers in 10,000 locations, often located where there is no physical security or trained IT staff. Therefore, edge AI servers must be secure, resilient, and easy to manage at scale.

Diagram shows the data center EGX servers accessing NGC frameworks and then uploading trained models; deploying the models to the edge and analyzing streaming data; sending low-confidence data back to the EGX servers; and using that data to re-train the models.
Figure 1. Data center to cloud workflow using edge AI

That is why organizations are turning to cloud-native technology to manage their edge AI data centers.

What is cloud-native?

Defining cloud-native is like the joke about describing an elephant while blindfolded. Are you touching a tusk, the trunk, or the tail?

  • For the IT administrator, cloud-native means managing infrastructure as code.
  • Software developers use cloud-native tools and techniques to write portable applications.
  • IT executives embrace cloud-native culture to reduce cost and increase efficiency.

Combining these perspectives, cloud-native is a modern approach to software development that uses abstraction and automation to support scale, portability, and rapid delivery.

Containerized microservices are the effective standard for cloud-native applications. Kubernetes is the market-leading platform for container orchestration. It uses declarative APIs to support automation at scale.

Cloud-native was born in the public cloud, but it is spreading fast in enterprise. Gartner predicts that the container orchestration market will grow to $944 million by 2024.

The Cloud Native Computing Foundation (CNCF) provides vendor-neutral governance to the ecosystem. CNCF curates and sustains open-source, cloud-native software projects. Containerd, Prometheus, and Kubernetes are popular projects maintained by CNCF.

Why cloud-native for edge AI?

How is cloud-native relevant to edge computing? Can tools built for massive public clouds benefit edge locations with one or two nodes?

The short answer is yes. Cloud-native architecture delivers more than massive scalability. It also delivers performance, resilience, and ease of management, all critical capabilities for edge AI.


For the past 15 years, enterprises preferred virtual machines (VMs) to consolidate applications onto fewer servers. But virtualization overhead can slow down application performance.

Edge AI favors containers. At the edge, performance is king. An autonomous vehicle has to slam on the brakes the moment it “sees” a pedestrian. Containers run with full bare metal performance. And many containers can share the same server, consolidating applications without virtualization’s performance overhead.

Kubernetes can also improve edge AI performance by optimizing workload placement. CPU Management policy isolates CPUs for specific workloads. This reduces context switches and cache misses. The device plug-in framework exposes accelerators (like GPUs or FPGAs) to Pods. And Topology Manager aligns CPU, memory, and accelerator resources along NUMA domains, reducing costly cross-NUMA traffic.

Operations and management

An edge AI data center might span hundreds of locations. Cloud-native tools support the massive scalability of public clouds and administrators can use the same tools to manage edge AI data centers.

Diagram shows multiple EGX servers with Helm charts connected to the cloud.
Figure 2. High-level architecture of a edge AI data center

Day one operations involve initial deployment and testing. Kubernetes is flexible enough to support diverse architectures on day one.

At one extreme, the entire edge data center is a single Kubernetes cluster. This architecture needs reliable communication between the centralized API endpoint and remote workers. The API endpoint is often cloud-based.

At the other extreme, every edge node is an independent cluster and maintains its own control plane and applications. This architecture is appropriate when centralized communication is intermittent or unreliable.

Kubernetes also supports cluster federation. Federated clusters share a single source of application configuration but are otherwise independent. Federation is appropriate for loosely coupled edge sites. For example, a hospital system may federate to share patient data.

After day one deployment, edge data center management shifts to day two operations. Updates, upgrades, and monitoring are day two operations. Automated and remote day two operations are critical for the stability and security of edge locations lacking local support staff.

The cloud-native ecosystem includes many popular tools for centralized observation. Prometheus is an open-source monitoring and alerting toolkit. Grafana is an open-source observability tool that can present data in graphical dashboards.

Software lifecycle management is also an important aspect of day two operations. Patching a VM image requires lengthy testing. Containers are bundled with their dependencies and interact with the kernel through stable interfaces. This enables CI/CD and other cloud-native practices that support rapid change at the edge.

Application resilience

Resilience refers to an application’s ability to overcome problems. This is another area where cloud-native benefits edge AI.

Cloud-native applications usually provide resilience through scaling. Multiple clones of the same application run behind a load balancer and service continues when a clone fails.

This approach works well in edge AI deployments where applications span two or more nodes. But many edge AI data centers have only one node per location.

Kubernetes also supports application resilience on single nodes. The container restart policy automatically restarts failed Pods and the Kubelet can use liveness probes to detect nonfailure conditions that require restarts.

The edge AI infrastructure software should be resilient as well. The Kubernetes operator pattern puts infrastructure administration on autopilot, automating tasks that a human typically performs. For example, a Kubernetes operator that detects a kernel upgrade on an edge node will automatically recompile the node’s drivers to the new kernel version.


Cloud-native delivers resilience and performance while simplifying operations. These are critical considerations for the edge AI. However, there are still areas where cloud-native must continue to evolve.

Ultra-low latency edge applications need greater visibility to the underlying hardware. For example, identifying which core in a CPU has lowest latency. Container orchestration platforms also want to improve workload isolation for multitenancy. The benefits and challenges of cloud-native edge AI is just one of the edge computing topics that we are exploring at the upcoming virtual GTC AI conference in November. Register today and be sure to check out the Exploring Cloud-Native Edge AI session, along with our other edge computing topics.