Data Center / Cloud

Open Source Time Synchronization Services for Data Center Operators

Scalable PTP stack

Applications are increasingly real-time and delay-sensitive, from distributed databases and 5G radio access networks (RANs) to gaming, video streaming, high-performance computing (HPC), and the metaverse. Nanosecond resolution time sync augments conventional computing in many ways, including:

  • helping databases improve accuracy, efficiency, collaboration, and security of data management systems by ensuring that multiple databases are kept up-to-date and consistent with each other
  • enhancing security policies by screening authentic user activity from malicious and robotic ones simply by looking at based latency patterns
  • enabling the “ready player one,“ or metaverse of gaming worlds
  • creating immersive shopping experiences, helping customers make informed purchasing decisions and reduce checkout hassle with computer vision and real-time analytics
  • automating large factories and facilities, driving production lines, warehouses, and machinery to new efficiencies by enabling the digital factory twin to mimic the real one, and vice versa
  • maintaining the accuracy, correct distribution, and in-time processing of the incoming bands in 5G networks

Through a series of collaborations, NVIDIA, Meta, and others in the Open Compute Project Time Appliance Project (OCP-TAP) established blueprints for a modern time synchronization solution that are open, reliable, and scalable. 

An open time synchronization solution

Meta achieved submicrosecond precision within their giant, globally spread data centers. This was achieved with hardware timestamping on commodity servers, even under CPU and network loads, and subject to temperature variances.  

Until recently, deploying Precision Time Protocol (PTP) at such a high scale required specialized and dedicated hardware and software components. In addition, there was an absence of good blueprints for how to enable precise time services in data centers. 

Precision Time Protocol tree topology diagram for data centers, including spine switches, ToR switches, NVIDIA NICs, and Open Time Server.
Figure 1. Precision Time Protocol (PTP) tree for data centers

This is where OCP-TAP comes in; specifically, the Time Cards innovation enables Meta to synchronize time between data centers. PTP IEEE-1588 applied on network interface cards (NICs) and networking devices, like the NVIDIA ConnectX, synchronizes all the machines within a data center using the network.

A time server that scales to millions of clients

The Open Time Server, which is open sourced by the OCP-TAP community, maintains the authoritative source of time for the data center. 

Diagram of the Open Time Server layers: Management and Monitoring, System Software, Time Card, NIC, and COTS Server.
Figure 2. Layers of the Open Time Server

The Time Card can support millions of clients/syncs. The NIC is capable of “full-wire-speed hardware timestamping.” Bottlenecks are pushed to the software domain.

Meta engineers rewrote the entire master functionality of a PTP daemon, with software architecture and design that take extra care regarding scalability. This stack is now known as PTP4U, a scalable PTP stack. For more details, visit facebook/time on GitHub. 

The Open Time Server was able to consistently support over 1 million clients (ordinary clocks) with synchronization frequency of 1 Hz, using the PTP4U server software. 

 Screenshot of PTP4U software in action, syncing over 1 million clients.
Figure 3. The scalable PTP stack PTP4U scales beyond 1 million clients

Commercial grandmaster clocks support up to several hundred clients, while hyperscale data centers require many orders of magnitude more. The need to support timing at remote edge locations of the network also increases the scale numbers.

Building a huge Open Time Server

If you had asked a PTP expert how to scale a PTP solution in the summer of 2021, the answer would likely have been to use a boundary clock (BC). There are two challenges with introducing BCs into data centers. 

The first challenge is operational. While not specific to Meta, BC implementation on network switches assumes certain hardware and software support. Introducing BCs into existing brownfield deployments poses a significant risk. The switches are the core elements of the entire network. Enabling BCs on all participating switches would require requalifying the entire network. This is a long, intensive, expensive, and risky procedure. In terms of ROI at that time, it would have been impossible. 

The second challenge relates to synchronization technology mandating each compute node to know not only the precise time, but also the uncertainty window, or degree of accuracy. For more details, see Spanner, True-Time, and The CAP Theorem

This means having an easy method to determine, for every participating node in the data center, the time offset from the grandmaster (and not only from the direct master as for BCs). A Time Server scaling to millions could rely on transparent clocks (TCs), avoiding BCs altogether.

Transparent clocks for data centers

Transparent clocks do not contribute to the total accumulated noise of the clock tree, simply because TCs are not really clocks, and they do not discipline any clocks. Instead, TCs simply publish the packets’ residency time, typically less than 1 microsecond, a small enough period that even a simple oscillator will not drift dramatically. 

Transparent clocks also reduce the operational complexity. They do not run software daemons and are more commonly supported by existing switches. This makes the introduction of PTP into brownfield data centers much simpler. 

Finally, TCs are transparent, such that each node is disciplined directly by the grandmaster clock. This facilitates directly figuring out the uncertainty window for all participating nodes.

Precision and accuracy in hardware 

A monolithic hardware clock supporting UTC is key for timestamping packets at full wire speed even in high-speed networks. NVIDIA added support for hardware timestamping in PTP4L (PTP 1588-2008 Linux daemon), which enables the system and applications to obtain time in UTC format. 

NVIDIA also made several other changes to PTP4L to improve its accuracy; for example, adding support for the use of a hardware reference clock, which can provide a higher level of accuracy than a software-based clock.

Testing PTP reliability at scale

To study how well PTP runs on a high-scale network mandates a method to constantly measure, gauge, and validate the synchronization precision at a high scale. We came up with an infinitely scalable test method using ConnectX-6 Dx Pulse Per Second (PPS-In) as the measurement. (Using the PPS-Out method would max with a handful of devices.)

To this we configured the ConnectX to run real-time clock mode and chained devices from its PPS-In through the PPS-Out (Figure 4). Using this method, we characterized very large PTP trees and proved our PTP solution down to nanoseconds.  

Test scheme diagram using PPS-In to characterize very large PTP trees. Diagram includes RF splitter, boundary clocks, ordinary clocks, and RF cables.
Figure 4. An infinitely scalable test scheme with PPS-In

Summary 

The Time Synchronization infrastructure blueprints are available for everyone and are ready for cloud providers and operators. NVIDIA will continue to invest in high-precision time synchronization with the goal to enhance all product lines and solutions. 

The journey is not yet complete. Sharing our work with the Open Compute TAP community, and working with our partners to build more blueprints for various use cases will be key to helping this solution become common and relatively easy to deploy.

Additional resources

Check out these related resources to learn more:

Register for NVIDIA GTC 2023 for free and join us March 20–23 to explore breakthroughs in AI, accelerated computing, and more.

Discuss (0)

Tags