Advancing Explainable AI in Radiology Research with NVIDIA Clara Reason

Medical AI has reached an inflection point. While vision-language models (VLMs) have shown promise in medical imaging, they have lacked the systematic, transparent reasoning that clinicians need to trust AI-assisted diagnoses. Changing this is NVIDIA Clara, a family of models, tools, and recipes that are built for accelerating scientific discovery, analyzing medical images, and providing a foundational understanding of human health, biology and chemistry.

Specifically, Clara Reason introduces multimodal chain-of-thought models that mirror radiologists’ thinking, providing step-by-step diagnostic reasoning with explanations that clinicians can validate and trust.

NVIDIA is expanding beyond traditional image analysis to create a medical AI reasoning ecosystem that combines foundational datasets with multimodal models to deliver interpretable decision support.

This post details the technical implementation of Clara NV-Reason-CXR-3B, a 3-billion-parameter VLM that is specialized on chest x-ray analysis. We cover the dataset creation methodology that captures radiologist thought processes through voice annotations, the two-stage training pipeline combining supervised fine-tuning with gradient reinforcement policy optimization, and validation results from clinical institutions.

Traditional medical AI approaches lack transparent reasoning

Today’s medical AI models often operate as black boxes, providing diagnoses without explaining their reasoning. This creates a trust barrier for clinicians who need to understand and validate AI recommendations before incorporating them into patient care decisions.

Traditional approaches to medical AI have focused on improving accuracy metrics without addressing the fundamental need for explainability. A radiologist doesn’t simply identify an abnormality—they systematically review anatomical structures, consider differential diagnoses, and articulate their thought process. The final diagnosis is more than just a label, it’s an internal thought process of a radiologist, based on years of experience, that led to it.

Reasoning AI models have demonstrated significant improvements in solving math, programming, and logic questions. By thinking step-by-step before answering, they’re able to break down tasks into subgoals to solve complex multi-step problems. Similarly in medical AI, the radiologist thought process allows the model to go deeper into each step and take on complex medical problems.

How does Clara Reason provide transparent medical AI reasoning?

Clara Reason addresses the explainability challenge through an architecture that combines multimodal perception with structured reasoning capabilities.

NVIDIA researchers contribute reasoning capabilities to Clara Reason through the Clara NV-Reason-CXR-3B model, a VLM specialized in chest x-ray analysis. It’s designed to think like a radiologist when analyzing chest radiographs, and provide a full chain-of-thought process that mimics internal thinking of the physician.

This enables AI to explain its diagnostic reasoning and provide detailed knowledgeable thoughts. It’s designed to respond in the style of a teacher, a senior radiologist, explaining the problem and the solution and offers:

Chain-of-thought processing
- The reasoning engine generates step-by-step diagnostic analysis
- Systematic anatomical review
- Identification of normal and abnormal findings
- Differential diagnosis consideration
Clinical output generation
- Main findings
- Step-by-step reasoning pathway
- Differential diagnoses and their likelihood
- Recommendations for follow-up or clinical correlation
- Clarification multi-step follow-up chat
- Structured report generation

According to Dr. Mariam Aboian, Assistant Professor at Children’s Hospital of Philadelphia (CHOP), “For the first time, generative AI is describing what is going on in radiologists’ heads and their chain-of-thought as they are thinking through the study, identifying the findings, and organizing them to determine the diagnosis. This provides innovation in explainability, which is critically needed for clinical implementation of AI and communication with physicians and medical providers across healthcare.”

Creating a dataset that captures how radiologists think

Through collaboration with the National Institutes of Health (NIH), Children’s Hospital of Philadelphia (CHOP), and VinBrain, NVIDIA researchers created the first dataset that captures radiologists’ thought processes. Unlike traditional datasets that focus on labels or reports, this collected data includes 1-2 pages of detailed radiological thinking per image, dictated by radiologists to capture their thought processes.

Systematic examination protocol

Radiologists were asked to dictate all their thoughts, deliberations, and uncertainties when reading chest x-ray, loosely matching the following order:

Quality Assessment → Medical Devices → Airways → Lungs (R/L) → Mediastinum → Heart → Abdomen → Bones → Summary

Each annotation takes 7-15 minutes and is broken down into 10-20 detailed distinct observations and thoughts such as, “I see haziness in the right lower lobe, which makes me consider…”

Innovative data collection

The team developed an annotation tool that captures authentic radiologist thinking. The key insight is the simplicity of implementation, including:

Voice recordings with speech-to-text capture natural clinical reasoning
Basic ROI tools link observations to image regions
Multi-language transcription enables global collaboration (transcribe and translate into English)
Raw audio/text files can be formatted for training—no proprietary tools required

Teams can implement a similar approach using existing viewers with basic annotation capabilities, or simply collect voice recordings alongside image reviews. The main goal here is to capture the radiologist’s thought process, not the specific tooling.

Annotation focus areas include:

Differential diagnoses: Include uncertainties and clinical reasoning
Negative findings: Explicitly state what’s normal/absent to provide complete clinical picture

In addition, the training dataset has been expanded with synthetic data by distilling from GPT-OSS 120B based on chest x-ray reports (MIMIC-CXR, Open-I) with the radiologist reasoning data serving as examples. The synthetic dataset is approximately ~100K data points.

NV-Reason-CXR-3B training pipeline

The NV-Reason-CXR-3B model leverages Qwen2.5-VL-3B-Instruct VLM as a starting point, and follows the approach popularized by DeepSeek-R1.

Stage 1: Supervised fine-tuning (SFT)

The initial stage trains the model on expert radiologist reasoning data using approximately 100K reasoning examples that combine our original annotations with synthetic data. Training runs on four nodes with eight NVIDIA H100 GPUs each (32 GPUs total) for 4 hours. The objective is to teach the model to generate structured diagnostic reasoning that follows authentic radiologist thought patterns.

Stage 2: Group Relative Policy Optimization (GRPO)

The second stage uses reinforcement learning to refine reasoning quality on larger datasets without requiring explicit reasoning annotations. Training uses an expanded chest X-ray dataset with verified diagnostic labels, using a reward function based on the percentage of correctly identified abnormalities and diagnoses. This differs from traditional GRPO applications in math and logic tasks that typically use binary rewards.

Training uses the same infrastructure as Stage 1 and runs for 4 days. This approach allows the model to learn from a broader dataset while preserving the structured thinking patterns established in the supervised fine-tuning stage.

What is the clinical validation and impact of Clara Reason?

Clara Reason acts as an AI co-pilot for radiologists, saving time while enhancing diagnostic confidence through transparent reasoning. The model demonstrates strong alignment with clinical thinking, validated by board-certified radiologists.

Key benefits include:

Time savings: Acts as a co-pilot, explains decisions, can write a structured report if necessary
Enhanced accuracy: Following the radiologists’ internal thought process helps with complex medical decisions.
Built-in trust: Transparent explanation of reasoning pathways
Teaching assistance: Explainability of decisions provides confidence and educational value

Core capabilities include:

Radiologist-aligned chain-of-thought: Captures actual internal thinking processes, not generic AI reasoning
Systematic examination patterns: Follows clinical protocols
Transparent decision-making: Every diagnosis includes explainable reasoning pathways
Confidence estimation: Calibrated uncertainty with clinical context

“The CXR reasoning model is an amazing opportunity for assisting not only referring doctors but also patients who would like to learn more about the thought process of establishing differential diagnoses using imaging findings from all anatomic structures covered in the field of view, along with patients’ clinical information and symptoms,” said Ismail Baris Turkbey, M.D., F.S.A.R., Senior Clinician, NCI/CCR/MIB, National Institutes of Health. “Additionally, this novel tool has significant potential to serve as an educational assistant for trainees in radiology and medicine.”

How does Clara Reason transform clinical workflows?

Clara Reason is designed for the following primary use cases:

Clinical decision support: Radiologists use Clara Reason as a “second reader” that provides detailed reasoning they can quickly validate. The transparent thought process allows clinicians to identify where they agree or disagree with the AI’s assessment, enhancing diagnostic confidence.
Medical education: Medical schools and residency programs integrate Clara Reason to help trainees develop systematic diagnostic thinking. The model’s detailed reasoning serves as an always-available teaching assistant that demonstrates expert-level analysis.
Research applications: Researchers use Clara Reason to analyze large imaging datasets with consistent, documented reasoning—enabling new insights into disease patterns and diagnostic variations across populations.

How to integrate Clara Reason into a medical AI application

Use the following quick start example to integrate Clara Reason into your medical AI application:

import torch
from transformers import AutoModelForImageTextToText, AutoProcessor
from PIL import Image


# Load the Clara Reason model 
model_name = "nvidia/NV-Reason-CXR-3B"
model = AutoModelForImageTextToText.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
).eval().to("cuda")


processor = AutoProcessor.from_pretrained(model_name)

# Load chest x-ray image
image = Image.open("chest_xray.png")

# Prepare input with clinical context
messages = [
    {
        "role": "user",
        "content": [
            {
                "type": "image",
                "image": image,
            },
            {
                "type": "text",
                "text": "Find abnormalities and support devices."
            }
        ]
    }
]

# Create prompt using chat template
text = processor.apply_chat_template(
    messages,
    add_generation_prompt=True
)

# Process inputs
inputs = processor(
    text=text,
    images=[image],
    return_tensors="pt"
).to(model.device)

# Generate reasoning with chain-of-thought
generated_ids = model.generate(
    **inputs,
    max_new_tokens=2048
)

# Trim the generated ids to remove the input ids
trimmed_generated_ids = [
    out_ids[len(in_ids):]
    for in_ids, out_ids in zip(inputs.input_ids, generated_ids)
]


# Decode the response
generated_text = processor.batch_decode(
    trimmed_generated_ids,
    skip_special_tokens=True,
    clean_up_tokenization_spaces=False
)[0]


print("Output:")
print(generated_text)

Example output:

THINK: We'll begin with the quality assessment of this AP chest x-ray. As you can see, the lung fields are relatively well visualized, although the patient's position and the fact that it's a portable AP view may limit the evaluation to some extent. I think the quality is acceptable for this type of examination.

Now, we are looking at the medical devices and instrumentation. I don't see any medical devices such as central venous catheters, endotracheal tubes, or chest tubes. However, I do notice that the patient is likely in a clinical setting, given the portable nature of the examination.

The central airways appear patent. The trachea is visible and not deviated, and the right and left main bronchi appear unremarkable. I don't see any foreign objects or abnormalities in the central airways.

Moving on to the right lung, we are evaluating the upper lobe first. The aeration of the right upper lobe appears reduced, which could be suggestive of some volume loss or atelectasis. I don't see any distinct masses or nodules, but there are some subtle opacities that could represent infiltrates or edema.

Next, we will evaluate the right middle and lower lobes. The lung volumes appear low, which could be indicative of some atelectasis or volume loss. I see some patchy opacities in the right lower lobe, which may represent pulmonary edema or infiltrates. The costophrenic angle is slightly blunted, which could be significant for a small pleural effusion.

Now, let's examine the left lung. The left upper lobe appears similar to the right, with reduced aeration and some subtle opacities that could represent infiltrates or edema. The left lower lobe also shows some patchy opacities and a slightly blunted costophrenic angle, suggestive of a small pleural effusion.

We are looking at the pleura now, and as I mentioned before, the costophrenic angles are slightly blunted bilaterally, which is concerning for small bilateral pleural effusions. I don't see any evidence of pneumothorax.

The mediastinum appears enlarged, which could be indicative of cardiomegaly. The cardiac silhouette is prominent, and I think this finding may be meaningful for moderate cardiomegaly.

The hila appear unremarkable, without any significant adenopathy or masses. The lung vasculature appears prominent, which could be suggestive of pulmonary edema.

The heart, as I mentioned before, appears enlarged, with a cardiothoracic ratio that is elevated. This finding is concerning for cardiomegaly.

Now, we are looking at the upper abdomen, and I see some normal-appearing bowel gas patterns. There is no free air under the diaphragm, which is important to note.

Finally, we are looking at the bony structures. The clavicles, scapulae, and ribs appear unremarkable, without any focal lesions or fractures. The vertebral bodies are well visualized and appear normal.

In summary, the most significant findings in this patient are the presence of mild pulmonary edema, small bilateral pleural effusions, and moderate cardiomegaly. These findings could represent a range of clinical conditions, including heart failure or other cardiopulmonary diseases.

ANSWER: Cardiomegaly, Edema, Enlarged Cardiomediastinum, Lung Opacity, Pleural Effusion

Get started with Clara Reason

Clara Reason introduces chain-of-thought models that mirror radiologists’ thinking—providing step-by-step diagnostic reasoning with explanations that clinicians can validate and trust. More specifically:

NV-Reason-CXR-3B generates step-by-step diagnostic reasoning for chest x-ray analysis, producing detailed thought processes rather than diagnostic labels alone.
Dataset methodology captures radiologist thought processes through voice recordings during image analysis, creating 1-2 pages of detailed reasoning per chest x-ray.
Two-stage training with GRPO enables reasoning with minimal annotated data by first learning from expert reasoning examples, then using reinforcement learning to refine reasoning quality on larger datasets without requiring reasoning annotations.

This breakthrough in medical AI is powered by collaboration.

Ready to get started?

Download NV-Reason-CXR-3B checkpoints from Hugging Face for local development
Visit NVIDIA-Medtech/NV-Reason-CXR on GitHub for training and inference examples

Stay up to date by subscribing to NVIDIA news, and following NVIDIA Healthcare on LinkedIn, X, and YouTube.