Technical Walkthrough

Data Science Best Practices for an Intelligent Edge Solution

Discuss (0)
Learn industry insights and best practice for implementing data science with AI at the edge.

Whether your organization is new to data science or has a mature strategy in place, many come to a similar realization: Most data does not originate at the core. 

Scientists often want access to amounts of data that are unreasonable to securely stream to the data center in real time. Whether the distance is 10 miles or thousands of miles, the bounds of traditional IT infrastructure are simply not designed to stretch outside of fixed campuses. 

This has led organizations to realize that no data science strategy is complete without an edge strategy. 

Read on to learn industry insights on the benefits of coupling data science and edge computing, the challenges faced, solutions to these challenges, and register to view a demo of an edge architecture blueprint.

Edge architectures 

Edge computing is a style of IT architecture that is typically employed to create systems that are tolerant of geographically distributed data sources and high latency and low-bandwidth interconnects. 

Due to restrictions imposed by the operating environment, computing systems designed in this way are typically identifiable by compromises on computational speed and high availability. 

Today, there are three types of edge architectures that are commonly being used by organizations:

  • Streaming data
  • Edge preprocessing
  • Autonomous systems

Streaming data 

In a "streaming data" model, data is collected at the edge and is immediately sent to the cloud for data reduction, ETL, and processing.
Figure 1. The streaming data architecture collects data at the edge and processes it in the cloud

Today, streaming data, the classic “big data” architecture, is the most popular prototypical architecture for organizations that are just starting to implement an edge strategy. This architecture starts with IoT devices, usually sensors, placed anywhere from a factory floor, hospital, or retail store. The data is then sent through the cloud to an IT system. 

As data processing abilities increase, this architecture can be a hindrance because of the level of infrastructure required and the large quantity of data that needs to move from the edge to core.

Edge preprocessing 

This image shows what an "edge preprocessing model" looks like. Data is collected and undergoes data reduction, as to only allow for the most important data to be fed to the cloud for processing.
Figure 2. Edge-preprocessing models are a hybrid edge and cloud model

The edge preprocessing model is the most common architecture for organizations transitioning to the edge.

Instead of sensor data feeding directly into a pipeline running in the data center, data is fed into an intelligent data reduction application. This is usually an intelligent machine-learning algorithm that decides what data is important and must be sent back to the data center. 

Extraction, transformation, and loading (ETL) processes are less important in this architecture because data reduction has already occurred at the edge. Therefore, there is no need for two data lakes, and inference can happen more quickly. The result is faster execution on business logic. 

This is a good stepping stone for creating fully autonomous systems, allowing for an unlimited amount of data compression.

Autonomous systems 

The image shows an "autonomous" data science solution where data is processed at the edge of a network instead of being sent back to a cloud or data center for processing.
Figure 3. Autonomous systems process data at the edge and are characterized by rapid decision-making

Fully autonomous systems are characterized by sensors collecting data at the edge to make rapid decisions with low latency. With no time to send data back to a data center or cloud to make a proper decision, processing happens at the edge and actions are taken automatically. 

With this architecture, every step of the pipeline is sent to a logging mechanism to record the decisions made at the edge. The batch logging sends messages to the cloud or core data center to allow for analytics and system adjustments on the decisions made. 

Industry insights for building the intelligent edge 

Building an intelligent edge solution is not just about pushing a container to tens or thousands of sites. While it may seem like a trivial task, your organization’s success relies heavily on the infrastructure that you put in place, not just the data science.  

There are many complexities that need to be taken into consideration when building an intelligent edge solution, such as scale, interoperability, and consistency.

Suggested technologies to build intelligent solutions include the following: 

  • Linux edge systems 
  • Containers 
  • Kubernetes 
  • Messaging protocols (Kafka, MQTT, BYO) 

Edge infrastructure in practice 

As organizations look to meet their business needs and enable data science to drive innovation, your options should not be limited to your architecture. Implementing an edge architecture helps you future-proof your platform against new use cases and technologies. 

While it is helpful to understand where your architecture stands among different stages of edge implementation, it is often best to view a live demonstration.  

For more information, view our webinar, Data Scientists on the Loose: Lessons Learned while Enabling the Intelligent Edge, for best practices regarding how to implement a Kubernetes system at the edge and the capabilities it can give your organization.

Learn more about edge computing and data science