OpenCL

OpenCL™ (Open Computing Language) is a low-level API for heterogeneous computing that runs on CUDA-powered GPUs. Using the OpenCL API, developers can launch compute kernels written using a limited subset of the C programming language on a GPU.

OpenCL support is included in the latest NVIDIA GPU drivers, available at www.nvidia.com/drivers

In addition to OpenCL, NVIDIA supports a variety of GPU-accelerated libraries and high-level programming solutions that enable developers to get started quickly with GPU Computing.

OpenCL is a trademark of Apple Inc., used under license by Khronos.

NVIDIA OpenCL SDK Code Samples


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL Multi Threads
This sample shows the implementation of multi-threaded heterogeneous computing workloads with tight cooperation between CPU and GPU. The new OpenCL 1.1 features user events, thread-safe API calls and event callbacks are utilized.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Using Inline PTX with OpenCL
A simple test application that demonstrates a new CUDA 4.0 driver ability to embed PTX in a OpenCL kernel.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL Marching Cubes Isosurfaces
This sample extracts a geometric isosurface from a volume dataset using the marching cubes algorithm. It uses the scan (prefix sum) function from the oclScan SDK sample to perform stream compaction.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL Tridiagonal
Efficient matrix solvers for large number of small independent tridiagonal linear systems. OpenCL implementation of 3 different solvers: Parallel Cyclic Reduction, Cyclic Reduction, Sweep (Gauss elimination + reordering optimization for full coalescing).
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL Device Query
This sample enumerates the properties of the OpenCL devices present in the system.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL Bandwidth Test
This is a simple test program to measure the memcopy bandwidth of the GPU. It currently is capable of measuring device to device copy bandwidth, host to device and host to device copy bandwidth for pageable and page-locked memory, memory mapped and direct access.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL Vector Addition
Element by element addition of two 1-dimensional arrays. Implemented in OpenCL for CUDA GPU's, with functional comparison against a simple C++ host CPU implementation.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL Dot Product
Dot Product (scalar product) of set of input vector pairs. Implemented in OpenCL for CUDA GPU's, with functional comparison against a simple C++ host CPU implementation.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL Matrix Vector Multiplication
Simple matrix-vector multiplication example showing increasingly optimized implementations.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL Overlapped Copy/Compute Sample
Element by element hypotenuse for two 1-dimensional arrays. Implemented in OpenCL for CUDA GPU's, with functional comparison against a simple C++ host CPU implementation. Demonstrates overlapped copy/compute in 2 command queues
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL Simple Multi-GPU
This application demonstrates how to make use of multiple GPUs in OpenCL.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL Simple OpenGL Interop
Simple program which demonstrates interoperability between OpenCL and OpenGL. The program modifies vertex positions with OpenCL and uses OpenGL to render the geometry.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Simple OpenCL D3D10 Texture
Simple program which demonstrates Direct3D10 texture interoperability with OpenCL. The program creates a number of D3D10 textures (2D, 3D, and CubeMap) which are written to from OpenCL kernels. Direct3D then renders the results on the screen.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. Simple OpenCL D3D9 Texture
Simple program which demonstrates Direct3D9 texture interoperability with OpenCL. The program creates a number of D3D9 textures (2D, 3D, and CubeMap) which are written to from OpenCL kernels. Direct3D then renders the results on the screen.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL Scan
This example demonstrates an efficient OpenCL implementation of parallel prefix sum, also known as "scan". Given an array of numbers, scan computes a new array in which each element is the sum of all the elements before it in the input array.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL Parallel Reduction
A parallel sum reduction that computes the sum of large arrays of values. This sample demonstrates several important optimization strategies for parallel algorithms like reduction.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL Matrix Transpose
Efficient matrix transpose.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL Matrix Multiplication
This sample implements matrix multiplication and is exactly the same as Chapter 6 of the programming guide. It has been written for clarity of exposition to illustrate various OpenCL programming principles, not with the goal of providing the most performant generic kernel for matrix multiplication. CUBLAS provides high-performance matrix multiplication.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL 3D FDTD
This sample applies a finite differences time domain progression stencil on a 3D surface.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL DCT 8x8
This sample demonstrates how Discrete Cosine Transform (DCT) for 8x8 blocks can be implemented in OpenCL.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL DirectX Texture Compressor (DXTC)
High Quality DXT Compression using OpenCL. This example shows how to implement an existing computationally-intensive CPU compression algorithm in parallel on the GPU, and obtain an order of magnitude performance improvement.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL Radix Sort
This sample demonstrates a very fast and efficient parallel radix sort implemented in OpenCL for CUDA GPUs.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL Sorting Networks
This sample implements bitonic sort algorithm for batches of short arrays
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL Black-Scholes Option Pricing
This sample evaluates fair call and put prices for a given set of European options by Black-Scholes formula.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL Hidden Markov Model
This sample implements a Hidden Markov Model in OpenCL for the GPU.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL Quasirandom Generator
This sample implements Niederreiter quasirandom number generator and Moro's Inverse Cumulative Normal Distribution generator.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL Mersenne Twister
This sample implements Mersenne Twister random number generator and Cartesian Box-Muller transformation on the GPU.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL 64-bin and 256-bin Histogram
This sample demonstrates efficient implementation of 64-bin and 256-bin histograms.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL Post-Process OpenGL-Rendered Image
This sample shows how to post-process an image rendered in OpenGL using OpenCL.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL Simple Texture 3D
Simple example that demonstrates use of 3D textures in OpenCL.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL Box Filter
Linear 2-dimensional variable-width Box Filter of RGBA image. Implemented in OpenCL for CUDA GPU's, with performance comparison against simple C++ on host CPU. Each of the R, G, B and A channels are treated independently with results computed concurrently for each.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL Sobel Filter
2-dimensional 3x3 Sobel Magnitude Filter of RGBA image. Implemented in OpenCL for CUDA GPU's, with performance comparison against simple C++ on host CPU. Gradient magnitude for each of the R, G & B channels is computed concurrently and independently, then combined into a single gradient intensity with linear weighting factors.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL Median Filter
Multi-GPU enabled, 2-dimensional 3x3 Median Filter of RGBA image. Implemented in OpenCL for CUDA GPU's, with performance comparison against simple C++ on host CPU. Each of the R, G & B channels are treated independently with results computed concurrently for each.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL Separable Convolution
This sample implements convolution filter of a 2D image with arbitrary separable kernel.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL Recursive Gaussian Filter
2-dimensional Gaussian Blur Filter of RGBA image using IRF method. Implemented in OpenCL for CUDA GPU's, with performance comparison against simple C++ on host CPU. Each of the R, G, B and A channels are treated independently with results computed concurrently for each.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL Volume rendering
This sample demonstrates basic volume rendering using 3D textures.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL Particle Collision Simulation
Simulation of elastic collisions of a large # of bodies. Implemented in OpenCL for CUDA GPU's.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac


For a direct link to this sample, right-click and copy the URL (shortcut) of this link icon. OpenCL N-Body Physics Simulation
Gravitational Simulation of a large # of bodies. Implemented in OpenCL for CUDA GPU's.
  Minimum Required GPU
Minimum Required GPUor later
Minimum Required GPU

Download - Windows (x86)
Download - Windows (x64)
Download - Linux/Mac