Developer Blog

Networking |

Telemetry Driven Network Quality and Reliability Monitoring with NVIDIA NetQ 4.0.0

NVIDIA NetQ 4.0.0 was recently released with many new capabilities. NVIDIA NetQ is a highly-scalable modern network operations tool leveraging fabric-wide telemetry data for visibility and troubleshooting of the overlay and underlay network in real-time. NetQ can be deployed on customer premises, or can be consumed as cloud-based service (SaaS). For more details, refer to the NetQ datasheet.  

NetQ 4.0.0 includes the following notable new features: 

  • CI/CD validation enhancements 
  • gNMI streaming of WJH events towards third-party applications 
  • SONiC support 
  • RoCE monitoring 
  • User interface improvements 

Refer to the NetQ 4.0.0 User Guide for details and all the other capabilities introduced. 

NVIDIA NetQ 4.0.0 user interface

Validation enhancements 

In the physical production network, NetQ validations provide insight into the live state of the network and helps with  troubleshooting. NetQ 4.0.0 provides the ability to: 

  • include or exclude one or more of the various tests performed during the validation 
  • create filters to suppress false alarms or known errors and warnings 

gNMI streaming of WJH events 

NVIDIA What Just Happened (WJH) is a hardware-accelerated telemetry feature available on NVIDIA Spectrum switches, which streams detailed and contextual telemetry data for analysis. WJH provides real-time visibility into problems in the network, such as hardware packet drops due to misconfigurations, buffer congestion, ACL, or layer 1 problems.  

NetQ 4.0.0 supports gNMI ( gRPC network management interface) to collect What Just Happened data from the NetQ Agent. YANG Model details are available in the User Guide. 

SONiC support 

NetQ now monitors the switches with SONiC (Software for Open Networking in the Cloud) operating system as well as Cumulus Linux. SONiC support includes traces, validations, snapshots, events, service visibility and What Just Happened. This is an early access feature. 

RoCE Monitoring 

RDMA over Converged Ethernet (RoCE) provides the ability to write to compute or storage elements using remote direct memory access (RDMA) over an Ethernet network instead of using host CPUs. RoCE relies on congestion control and lossless Ethernet to operate. Cumulus Linux supports features that can enable lossless Ethernet for RoCE environments. NetQ allows users to view RoCE configuration and monitor RoCE counters with threshold crossing alerts. 

User interface enhancements 

The NetQ GUI is enhanced to show switch details in the topology view.  Using the GUI, premises can be renamed and deleted.  

NVIDIA AIR is updated with NetQ 4.0.0, check out and upgrade your environment to take advantage of all the new capabilities. To learn more, visit the NVIDIA ethernet switching solutions webpage.