NVIDIA Clara Guardian

NVIDIA Clara™ Guardian is an application framework and partner ecosystem that simplifies the development and deployment of smart sensors with multimodal AI, anywhere in a healthcare facility. With a diverse set of pre-trained models, reference applications, and fleet management solutions, developers can build solutions faster—bringing AI to healthcare facilities and improving patient care.

Get Started

clara software stack

Clara Guardian’s key components include healthcare pre-trained models for computer vision and speech, training tools, deployment SDKs, and NVIDIA Fleet Command. NVIDIA Fleet Command is a hybrid-cloud platform for securely managing and scaling AI deployments across millions of servers or edge devices at hospitals.

This makes it easy for ecosystem partners to add AI capabilities to common sensors that can monitor crowds for safe social distancing, measure body temperature, detect the absence of protective gear such as masks, or interact remotely with high-risk patients so that everyone in the healthcare facility stays safe and informed.

Applications and services can run on a wide range of hardware, allowing developers to securely deploy anywhere, from the edge to the cloud.

Time to Solution

Leverage high-performance, pre-trained models to build accurate AI in healthcare.

Cloud-Native, Edge First

Scale software quickly and deploy applications easily at the edge.

Secure Management

Securely manage and scale AI deployments across dozens or up to millions of servers or edge devices.

Healthcare-Specific, Pre-Trained Models

Clara Guardian For Speech

Clara Guardian for speech is a healthcare domain specific version of Riva conversational AI capabilities.

  • For automated speech recognition (ASR), models perform offline and streaming recognition to automatically add punctuation, output word timestamps, and return top-n transcripts.
    CitriNet is the recommended new end-to-end convolutional Connectionist Temporal Classification (CTC) based ASR model. CitriNet models take in audio segments and transcribe them to letter,bytepair or word piece sequences.CitriNet has been trained on ASR dataset and ,without any external LM, it reaches Word Error Rate (WER) 6.22% on LibriSpeech test-other, and can run efficiently on a variety of hardware/ GPUs as shown here.
    The Conformer-CTC model is a non-autoregressive variant of the Conformer model for Automatic Speech Recognition that uses CTC loss/decoding instead of Transducer.
  • For natural language understanding (NLU), deep learning models understand context via encoded vectors and provide appropriate outputs for specific language tasks like next-word prediction and text summarization.
  • For text to speech (TTS), a speech synthesis model is based on FastPitchHifiGanE2E. FastPitchHifiGanE2E is an end-to-end, non-autoregressive model that generates audio from text. It combines FastPitch and HiFiGan into one model and is trained jointly in an end-to-end manner.

Speech models (ASR, NLP, and TTS) can be used to capture, process, and respond to common requests that a patient might make when they are in a healthcare setting.

Clara Guardian For Computer Vision

Clara Guardian for computer vision is a healthcare domain specific version of DeepStream and Riva computer vision capabilities.

Clara Guardian contains pre-trained models for applications such as gesture recognition, heart rate monitoring, mask detection, and body pose estimation.

  • Body pose estimation can be used to detect positions of key joints and facial landmarks in the body (eyes, ears, elbows, shoulders, wrists, hip sides, knees, ankles, nose, neck, etc) to build patient monitoring AI models.
  • Gesture recognition models can recognize a set of common gestures (wave, okay, thumbs-up, stop, etc).
  • Heart rate estimation can be used to obtain the heart rate of a person just by observing the video stream of a person’s face.

Pre-compiled NVIDIA TensorRT engines are optimized on NVIDIA GPUs.

Secure Management with Fleet Command

NVIDIA Fleet Command is a hybrid-cloud platform for securely and remotely deploying, managing, and scaling AI across dozens or up to millions of servers or edge devices. Instead of spending weeks planning and executing deployments, in minutes, administrators can scale AI to hospitals. With the capability of an entire IT division in a single control plane, administrators can manage the lifecycle of AI applications, update system software over the air, and remotely monitor and access systems.

See how our customers are using it:

GTC talk for more technical details

An End-to-End AI Solution

Clara Guardian includes GPU-optimized components that can accelerate every stage of your application development.


  • A collection of healthcare-specific, pre-trained computer vision and conversational AI models for a variety of use cases.
  • NVIDIA NeMo to build conversational AI models for ASR, NLP, and TTS
  • TAO Toolkit to create highly accurate computer vision models with zero coding


  • NVIDIA Riva for deploying conversational AI models that fuse vision, speech, and other sensor data
  • NVIDIA DeepStream SDK for a multi-platform scalable video analytics framework with Transport Layer Security (TLS) that can deploy on the edge and connect to any cloud


“Our AI-powered IOT platform, running on NVIDIA Clara Guardian, is used by leading hospitals, such as Northwestern Medicine, to screen hundreds of thousands of people for elevated temperatures and help front-line providers safely care for patients during the pandemic. Clara Guardian made smart hospitals at the edge possible, enabling our customers to increase staff productivity by over six-fold, saving millions of dollars in staffing costs while improving patient care.”

Andrew Gostine, MD, CEO of Whiteboard Coordinator Whiteboard

“We have been using NVIDIA GPUs in Ouva solutions from day one. With our new solution, we are aiming to allow nurses to monitor hundreds of patients in real time. The Clara Guardian framework allows us to build a scalable and efficient solution in no time, allowing our team to focus on our core competency—developing algorithms that unlock the potential of remote care.”

Dogan Demir, CEO of Ouva ouva


Intelligent Video Analytics
Speech and NLP
  • NVIDIA NeMo, an open-source toolkit for building conversational AI models
  • NVIDIA Riva SDK for deploying conversational AI models that fuse vision, speech, and other sensor data
Edge Hardware

Explore the Fleet Command to securely manage and scale AI deployments.

Learn More

Disclaimer: Clara SDKs and samples are for developmental purposes only and cannot be used directly for clinical procedures.