NVIDIA Computer Vision

Empower your devices to perceive and understand the world around us with powerful software that’s masterful, scalable, and tested.




NVIDIA software enables the end-to-end computer vision workflow—from model development to deployment—for individual developers, higher education and research, and enterprises.

Computer vision is a field of technology that enables devices like smart cameras to acquire, process, analyze, and interpret images and videos. For example, the driver assistance system on a vehicle designed with computer vision algorithms uses cameras and other sensors to not only display, but to perceive what’s in front of and behind it to identify and classify regions or points of interest within an image frame. In this case, computer vision has a safety application—helping the vehicle operator to navigate around road debris, other vehicles, animals, and people. Similarly, farmers might rely on computer vision-enabled devices to automatically identify weeds and where crops are growing well over a large field to increase yield. Today’s computer vision tasks like these are based on artificial intelligence and, more specifically, deep learning, a type of machine learning patterned after the brain. Deep learning-based computer vision models enable devices to perform and adapt like a human expert while requiring significantly less input.







Computer Vision Techniques

Most computer vision techniques begin with a model, or a mathematical algorithm, that has been trained with volumes of data to accomplish a specific task. Some of the common techniques include:



Computer Vision classification

Classification

Classification involves determining and classifying what object is in an image or video frame. Classification models are usually trained with a large dataset to identify simple objects like dogs, cats, chairs, or very specific ones like the type of vehicles in a road scene. The quality of the classification output depends on the training data used. The more the quantity and diversity of the training data, the higher the degree of precision.

Computer Vision detection

Detection

Detection involves locating and localizing an object or multiple objects within an image or a video frame. The algorithm outputs a rectangular bounding box around the detected object to indicate its location in the image. Object detectors may be trained to detect cars, road signs, people, or other objects of interest within an image or a video frame.

Computer Vision segmentation

Segmentation

Segmentation involves locating objects or regions of interest precisely in an image by assigning a label to every pixel in an image. This way, pixels with the same label share similar characteristics, such as color, or texture. Segmentation models are very commonly used in medical imaging for performing tasks like automatically detecting tumors in Magnetic Resonance Imaging (MRI) scans.

Image Synthesis

Image Synthesis

Image Synthesis involves creating or artificially generating images containing certain desired objects or content. Generative Adversarial Networks, or GANs, are a type of neural network that are commonly used for synthesizing these images, or even frames of a video. A common application of image synthesis is text to image translation that involves using GANs to generate images based on a textual description.





Artificial Intelligence-Based
Computer Vision Workflow

The computer vision workflow is highly dependent on the task, model, and data. A typical, simplified AI-based end-to-end CV workflow involves three (3) key stages— Model and Data Selection, Training and Testing/Evaluation, and Deployment and Execution.


Let’s look at these stages using the CV detection technique to identify a dog

(classification and segmentation-based techniques would follow an identical workflow).



Finding Fido: Developing an AI-Based, Object-Detection CV Workflow

Challenge: You want to build software for a monitoring system that automatically detects when your dog arrives at or leaves through the backdoor.


Three Stage Solution:


Model and Data Selection


Select an object-detection model.

Collect photos of your dog (let's call him Fido) that you can use to train and fine-tune your model to recognize Fido.


Training and Testing/Evaluation


Train and test your model using different photos of Fido to affirm the model's accuracy in detecting him.


Deployment and Execution


Deploy the trained model to hardware to monitor and detect the next time your dog leaves the house using an installed camera.


Below, a high-level diagram summarizes the AI-based CV solution.


Auto Misc Computer-Vision Solutions High Level Diagram

NVIDIA enables the end-to-end CV workflow. NVIDIA not only provides AI-based pre-trained models, but also tools for Training and Testing/Evaluation and software application frameworks for Deployment and Execution. Learn more below about how NVIDIA enables every stage of CV development.





Get Started with NVIDIA’s Pre-Trained Models
for Computer Vision


Developing models for these techniques on your own would require a lot of training data, time, and expertise. Here’s the good news- you do not have to be an expert to get started. NVIDIA hosts a number of pre-trained models, already-built and ready-to-use, to start developing your own computer vision solutions. Start with NGC, our GPU-accelerated software hub, to learn about computer vision models and resources, as well as other deep learning-based speech and natural language processing use cases and application frameworks.


EXPLORE PRE-TRAINED MODELS ON NGC CATALOG






Develop the End-to-End Computer
Vision Workflow

Start with NVIDIA pre-trained models, TAO, and DeepStream to make the end-to-end computer vision AI development process easier.



NVIDIA metropolis

AI Model Adaptation Framework

NVIDIA TAO

Fine-tune pre-trained models with custom data to produce highly accurate computer vision and conversational AI models in hours rather than months.

LEARN MORE
NVIDIA metropolis

Streaming Analytics Toolkit

NVIDIA DeepStream SDK

Build real-time vision AI applications for multi-sensor processing, video, audio and image understanding.



LEARN MORE
NVIDIA translation screenshot

Smart Infrastructure & Cities

NVIDIA Metropolis

Build an end-to-end, video-based analytics platform or use one from a partner within the industry ecosystem.



LEARN MORE


Explore Computer Vision Across NVIDIA Software

Learn how to develop computer vision applications using NVIDIA's industry-specific software products and platforms.

NVIDIA clara guardian

Healthcare

Clara

Develop computer vision models for gesture recognition, heart rate monitoring, mask detection, and body pose estimation in a hospital room to detect falls. Build, manage, and deploy workflows in medical imaging, medical devices with streaming video, and smart hospitals.

Learn More

Automotive

DRIVE

Develop end-to-end (E2E) computer vision solutions for the autonomous vehicle (AV) and the intelligent cockpit (IX). Collect and generate computer vision data, train DNN models using the E2E simulation platform (DRIVE Sim).

Learn More
NVIDIA translation screenshot

Video Streaming

Maxine SDK

Create virtual collaboration and content creation applications with video effects, audio effects and augmented reality.

Learn More
NVIDIA riva

Multimodal Conversation

Riva

Develop multimodal conversational AI applications by fusing vision, audio, and other sensor inputs simultaneously.

Learn More

Envision Next-Generation Computer Vision

Discover new technologies and innovative research work on computer vision at NVIDIA

Computer Vision Research Engineers

Research

Emerging Innovation


Learn what problems our computer vision research engineers and data scientists have been solving. Read our latest publications.


Learn More
Computer Vision Research Engineers

Robotics

NVIDIA Isaac Sim


Develop, test, train, and manage robots in virtual environments. Use computer vision for manipulation, navigation, and synthetic data generation.


Learn More


Explore NVIDIA’s GPU-Accelerated Libraries and Optimization Platform

Learn how NVIDIA’s libraries and optimization platform accelerate computer vision on GPUs.

data loading library

Data Pipeline Accelerator

Data Loading Library (DALI)

Load and process computer vision and audio data using GPUs. Use directly in TensorFlow, PyTorch, MXNet, and PaddlePaddle models.

Learn More

3D Deep Learning Research Library

NVIDIA KAOLIN Library

Generate synthetic data. Render and visualize 3D training datasets.



Learn More
NVIDIA translation screenshot

Image and Signal-Processing Library

NVIDIA Performance Primitives (NPP)

Deploy ready-to-use, domain-specific, high-performance functions for image, video, and signal processing.

Learn More
Vision Programming Interface (VPI)

Embedded Computer Vision and Image Processing Library

Vision Programming Interface (VPI)

Implement asynchronous computer vision and image processing applications in real-time.

Learn More
nvJPEG and nvJPEG2000

Image Decoding Libraries

nvJPEG and nvJPEG2000

Accelerate processing of JPEG and JPEG2000 images.




Learn More
NVIDIA Optical Flow SDK

Motion Flow Generation

Optical Flow SDK

Recognize, classify, and track objects and actions in a video stream by enhancing flow-vector computation between frames using GPUs.



Learn More
NVIDIA Tensor RT

Inference Optimizer and Runtime

TensorRT

Enable delivery of low latency and high throughput for inference applications.




Learn More

Your World, Powered by NVIDIA Computer Vision

Get Started With Frequently Asked Questions

Computer vision is more than research. It delivers practical, real-world solutions that change lives. NVIDIA’s deep expertise in artificial intelligence and high-performance computing provides endless opportunities to meaningfully impact the world.


Learn More
Get started with Frequently Asked Questions
Learn the Fundamentals of Deep Learning

Learn the Fundamentals of Computer Vision

New to computer vision? Want a primer before jumping in? Learn the Fundamentals of Deep Learning with hands-on exercises for CV in this eight-hour course offered by the Deep Learning Institute . You’ll learn how to train deep learning models from scratch and use pre-trained models, experiment with different model architectures, explore deep learning tools and techniques, and work with datasets to improve model accuracy. You’ll also earn a certification to show your accomplishment.


JUMP IN

What’s New in Computer Vision

Meta Works with NVIDIA to Build Massive AI Research Supercomputer

Meta Works with NVIDIA to Build Massive AI Research Supercomputer

Meta’s AI supercomputer — the largest NVIDIA DGX A100 customer system to date — will deliver 5 exaflops of AI performance.

NVIDIA Jetson-based Robots Excel in DARPA Underground Competition

NVIDIA Jetson-based Robots Excel in DARPA Underground Competition

With NVIDIA Jetson embedded platforms, teams at the DARPA SubT Challenge detected objects with both high accuracy and high throughput.

Advanced Kernel Profiling with the Latest Nsight Compute

Advanced Kernel Profiling with the Latest Nsight Compute

Nsight Compute kernel profiler now includes Range Replay, Memory Analysis, and Guided Analysis enhancements.


Computer Vision: Real-World Applications

No challenge is too small and no company too big for computer vision. See innovative solutions in action—from startups to global manufacturers.

Improving Mobility for People with Low Vision (Biel Glasses) impaired

Improving Mobility for People with Low Vision (Biel Glasses) impaired

Increasing Vehicle Quality Using Computer Vision and AI (Audi)

Increasing Vehicle Quality Using Computer Vision and AI (Audi)

Deploying NGC Models with Maximo Visual Inspection (IBM)

Deploying NGC Models with Maximo Visual Inspection (IBM)

What challenges are you facing with building Computer Vision solutions?

We want to hear about your pain points in developing computer vision solutions to see how we can enable you.

Partnering for Success

Global challenges take a community. We support you in tackling challenges with powerful solutions to meet your exact needs.

Airmeet
Airsmat
BMW logo
Kings College London
Microsoft
Miovision
Notch
Ping An
Quantiphi logo
Smartcow
T-Mobile
Touchcast logo
Verizon logo

The World of Computer Vision Solutions is Powered by NVIDIA.

JOIN US