Edge Computing

Common Challenges with Conducting an Edge AI Proof of Concept

A proof-of-concept (POC) is the first step towards a successful edge AI deployment.

Companies adopt edge AI to drive efficiency, automate workflows, reduce cost, and improve overall customer experiences. As they do so, many realize that deploying AI at the edge is a new process that requires different tools and procedures than the traditional data center.

Without a clear understanding of what distinguishes a successful and unsuccessful edge AI solution, organizations often succumb to common pitfalls, starting in the POC process. 

In fact, Gartner details that by 2025, 50% of edge computing solutions deployed without an enterprise edge computing strategy in place will fail to meet goals in deployment time, functionality, or cost.

As the leading AI infrastructure company, NVIDIA has helped countless organizations, customers, and partners successfully build their edge AI POCs. This post details the common edge AI POC challenges and solutions.

Before you start

The first decision that an organization makes before starting the process is to determine whether to buy a solution from an AI software vendor or to build their own.

Typically, companies that do not have in-house AI expertise partner with a software vendor. Vendors have insight into the best practices and can provide guidance to make the POC process as streamlined and cost-effective as possible.

Companies that have the technical capability can build a custom solution at a lower cost.

Defining the steps from development to production

Workflow diagram starts with ongoing model development, where organizations collect data and train models. Next, a hands-on trial for 1-2 months is where organizations use free trials to test software. Third is the proof of concept for 1-3 months where organizations validate that software works with company data. Lastly, putting a model into production is an ongoing task of continuing to monitor the application.
Figure 1. Four steps from AI model development to production

While the process of developing and deploying an application may vary for different organizations, most organizations follow this process:

  1. AI model development
  2. Hands-on trial
  3. Proof of concept
  4. Production

AI model development

Your data requirements depend on whether you’re using pretrained models or building from scratch. Even when an AI application is purchased, most models must still be retrained on labeled data from your environment to achieve the desired accuracy.

Some data sources may include raw data from sensors at the edge, synthetic data, or crowdsourced data. Expect data collection to be the timeliest task of model development, followed by optimizing the training pipeline.

The purpose of this phase is to prove the feasibility of the project and model accuracy, not to get production-level performance. This phase is ongoing, as the model is continually retrained as new data is collected.

Hands-on trial

The more prepared organizations are for their POC, the smoother deployments will run. We highly recommend that you use free trials to test different software options before committing to them in the POC phase.

For example, free programs such as NVIDIA LaunchPad equip a curated experience with all of the hardware and software stacks necessary to test and prototype end-to-end solution workflows. The result is that the same stack can be deployed in production, enabling more confident software and infrastructure decisions.

Testing a solution before starting the POC streamlines the overall process and minimizes the common trap of entering a never-ending POC. 

Proof of concept

The POC is a 1–3-month engagement where IT requirements are defined, hardware is acquired, and models are trained with company data and deployed in the company’s production environment to limited locations.

Unlike the hands-on trial, the key to this step is incorporating the company’s data rather than just testing standard software and hardware and generic data. The goal of a POC’s validation process is to verify the problem-solution fit, and that the solution can meet business requirements. It acts as the final test before a solution is fully scaled.

Production

In production, the AI model is deployed to every intended location and is fully functioning. Ongoing monitoring is expected.

What are the common challenges?

Following these four steps maximizes the chances of a smooth deployment. Unfortunately, most enterprises get stuck in the POC phase because they did not properly scope out the project, understand the requirements, define the measures of success, or have the correct tools and processes in place. 

To get the most out of your POC program, have a solution in mind to combat the following common challenges that enterprises face when deploying AI at the edge:

  • Misalignment on POC design
  • Manual management of edge environments
  • POC creeps into production

Misalignment on POC design

When preparing for a POC project, first set expectations and then align on them. The steps should include identifying a high-value use case to solve, setting the project scope, determining measures of success, and ensuring stakeholder alignment.

High-value use case

Make sure that your problem statement is of high value and can be solved with AI. The key is to recognize which types of problems to hand off to the AI and which problems can be solved through managerial changes, or improved employee training.

Solving a problem that provides high value to your organization helps justify the resources and budget needed to prove the solution’s efficacy and enable scaling. Selecting a low-value use case runs the risk of the project losing focus before a full solution can be rolled out.

Examples of high-value use cases that solve a business problem include improving safety, efficiency, and customer experiences, and reducing costs and waste.

Measures of success

The purpose of a POC is to validate a solution quickly, so it’s important to run a focused POC with clear project goals.

If the success criteria are not properly defined, organizations typically experience the “moving goal post” phenomenon, where they find themselves constantly re-adjusting and re-designing the POC to meet ever-changing goals. A never-ending POC is costly and time-consuming.

The most common measures of success include:

  • Accuracy: Can the problem be solved with AI? Verify by testing whether the model can reach the desired accuracy. Accuracy is the first metric that should be tested. If model accuracy cannot be reached, then another solution should be put in place.
  • Latency: Does the solution add value to the overall system or process? It is not enough for a problem to be solvable with AI, it must provide value. For example, if a computer vision application at a manufacturing line works but requires the company to operate the line at 50% speed, the cost of slowing down the manufacturing line is not worth the benefit of using AI. 
  • Efficiency: Is the solution cost-effective? Check whether the solution’s capital expenditures and operating expenditures are more favorable than other solutions. For example, if a network upgrade is necessary for the edge AI model to be effective, is it cheaper just to hire people to inspect products at your manufacturing line?

Defining the POC objectives, scope, and success criteria before executing the POC is the best way to understand whether the selected use case and solution can really achieve the intended benefits.

Stakeholder alignment

A POC requires a diverse team. To optimize your chances of success, identify and engage with both technical and business experts early on.

The involved stakeholders are usually business owners, AI developers, data scientists, IT, SecOps teams, and AI software providers. The AI software providers are particularly important because they have the knowledge, experience, and best practices. At this stage, identify the responsibilities of each stakeholder, including who owns the project after it scales.

Manual management of edge environments

Edge environments are unique because they are highly distributed, deployed in remote locations without trained IT staff, and often lack the physical security that a data center boasts.

These features present unique, often overlooked challenges when deploying, managing, and upgrading edge systems. It is extremely difficult and time-consuming for IT teams to troubleshoot issues manually at every remote edge site every time an upgrade is required or an issue arises.

Unfortunately, existing data center tools are not always applicable to edge AI environments. Moreover, because a POC is deployed to limited locations, organizations usually overlook a management tool during this phase and opt to update their models manually.

The POC is a highly iterative process, so implementing a management platform in this phase can help organizations save time.  For customers who do not already have edge management tools in place, turnkey solutions like NVIDIA Fleet Command can help with the rollout of a POC as well as its transition to production.           

Remote management

After setup, day 1, and day 2 operations begin, organizations must deploy and scale new applications, update existing applications, troubleshoot bugs, and validate new configurations.

Having remote management capabilities that are secure is critical because production deployments contain important data and insights that you want to keep safe.

Third-party access

Organizations should implement a management solution with advanced functionality for third-party access and security functions such as just-in-time (JIT) access, clearly defined access controls, and timed sessions.

Software vendors, system integrators, and hardware partners are just a few different parties that may need access to your systems. Coupled with remote management functionality, third parties can help make updates to your POC environment without gaining physical access to your edge location.

Monitoring

Tracking performance is important, even in the POC phase, because it can help with sizing and showing where bottlenecks may occur. These are important considerations to iron out before scaling.

POC creeps into production

A POC does not have to be fully production-ready for it to be successful. While it is true that the closer an organization can get to production specs in the POC phase, the easier it will be to scale, most POCs are not designed for production.

Many times, companies use whatever hardware or software they have on hand. This means that upon completion of a POC, businesses should go back and update their models and hardware before their final deployment. Many do not.

Here are some tips for transitioning from POC to production.

Measure efficacy

Track the efficacy of all software and hardware to help make decisions on what should be moved into production, and what must be upgraded.

Use enterprise-grade hardware and software

While it is okay to use existing systems that a business may already have during a POC, take extra time to understand what systems are needed for production and any implications of that change.

Only use software from a trusted source with a line of support to speak to when needed. Many organizations deploying edge applications download software online without researching whether it is from a trusted source and then they accidentally download malware. 

Prepare for success

Ultimately, POCs are just the first step to a successful deployment. They are designed to help organizations determine whether a project should move forward and whether it is an effective use of their resources. Edge AI is a paradigm shift for most organizations. To avoid common pitfalls when deploying your solution, see An IT Manager’s Guide: How to Successfully Deploy an Edge AI Solution.

Discuss (0)

Tags