Below is the list of samples that Devtech Proviz team made available at

Each sample is a separate github repository, so that people can “cherry-pick” what they want. You can run cmake on each separate samples. But if you want to build all the samples you downloaded together in one single project (Visual Studio “Solution”, for example), you can clone the repository called “build_all” and run cmake on it: cmake will browse the upper folder for available samples and put them together.


Note1: samples often require data located in 2-3 shared repositories:

  • shared_sources : contains helpers (math, cmake stuff...) that sample would require to build
  • shared_external : a compilation of external tools (zlib, AntTweakBar...) that some samples may require to compile. This folder is not mandatory but helps you to avoid struggling with finding the proper packages of tools on the web

It is possible that some samples will provide these shared repositories as git “sub-modules”, which make them self-sufficient. But it is always good to have these shared folders cloned for samples don’t have sub-modules setup.

Note2: Linux users must install GLFW and make sure the checkbox USE_GLFW is checked in the cmake settings.

Vulkan Samples


This sample show a Vulkan implementation of the super-sampled technique explained here

The source code contains two implementations: OpenGL original implementation and Vulkan implementation

This sample shows also a way to write Vulkan code in a rather more compact fashion: by using constructors and functors. Details of this part are in NVK.cpp.



This example shows how to write some Vulkan code and how to implement a multi-threading approach based of thread-workers.

Thread-workers are spawned to build parts of the scene divided in smaller chunks. Each thread-worker will get the responsibility to write secondary command buffers.

When all are done, the main thread will gather these command-buffers to finally integrate them to the primary command-buffer.

This sample shows also a way to write Vulkan code in a rather more compact fashion: by using constructors and functors. Details of this part are in NVK.cpp.



This sample sets up the Vulkan Device, queue etc, loads a model from a bespoke file format along with associated materials and textures and renders with a single thread.



The sample allows comparing various rendering approaches using core OpenGL, extended OpenGL via bindless and NV_command_list as well as Vulkan. It currently does make use of NVIDIA specific extensions to use Vulkan within an OpenGL context and display a Vulkan image.

The content being rendered in the sample is a CAD model which is made of many parts that have few triangles. Having such low complexity per draw-call can very often result into being CPU bound. The sample is a fork of the public cadscene OpenGL sample and introduces the usage of multiple CPU threads.

We also published a blog post on this sample:


Vulkan Fish Tornado (External contribution)

Fish Tornado from the Cinder team (special thanks to Hai Nguyen) takes full advantage of the Vulkan API to render a school of big eye trevally. Originally written by Robert Hodgin and based on animal behavior simulation research by Craig Reynolds and Professor Iain Couzin, the demo shows different schooling patterns. Periodically their peaceful swimming is interrupted by the predatory approach of a passing shark.

The screenshots presented here are rendered using Vulkan and built using Cinder, an open source C++ framework for creative coding. Originally written against OpenGL, Fish Tornado was ported to Vulkan during the initial implementation of Cinder’s Vulkan support.


Other Samples


This sample shows how to precompute ambient occlusion with OptiX Prime, and store it on the vertices of a mesh for use during final shading with OpenGL. The steps are as follows:

  • Distribute sample points over a mesh. We place a minimum number of points per triangle, then use area-based sampling for any remaining points. The total number of samples is a user parameter.
  • Compute ambient occlusion at the sample points. To help limit memory usage, we shoot rays in multiple batches using OptiX Prime. Each batch has a single jittered ray on a subset of the sample points. Geometry can be instanced and/or marked as a blocker which occludes rays but does not receive sample points of its own.
  • Resample occlusion from sample points to vertices. If the external Eigen 3 template library was found during CMake configuration, then we use the filtering method from "Least Squares Vertex Baking" (Kavan et al, EGSR 2011). Eigen is open source. In the absence of Eigen support, we use simpler barycentric resampling. This shows more visual artifacts, especially when the input mesh has large triangles. A copy of Eigen is included in the "eigen" subdirectory and will be used by default.
  • Visualize occlusion in OpenGL as a vertex attribute.
  • GitHub


    (Added on 5/8/2015)

    This is a small sample that demonstrates the most efficient way to use the CUDA-OpenGL interop API in a single-threaded manner.

    This example computes with CUDA a temperature scalar field that gets updated every frame. The visual result is a a 256 x 256 x 256 uniform grid. It is rendered in OpenGL with a basic ray-marching fragment shader.

    The CUDA compute part is a simple heat propogator. Since at every time step our result depends on the result of the previous frame, we pingpong the 3D texture resource handles back and forth every frame.



    (Added on 5/31/2016)

    This is a small sample that demonstrates explicit OpenCL-OpenGL synchronization in a single-threaded manner.

    This sample is the same as previous CUDA sample, but for OpenCL



    This sample implements several scene rendering techniques, that target mostly static data such as often found in CAD or DCC applications. In this context static means that the vertex and index buffers for the scene's objects hardly change. It is still fine to edit the geometry of a few objects of the scene, but foremost the matrix and material values would be modified across frames. Imagine making edits to the wheel topology of a car, or positioning an engine, that means the rest of the assembly is not modified. The principle OpenGL mechanisms hat are used here are described in the presentation slides of SIGGRAPH 2014. It is highly recommended to go through the slides first.

    The sample makes use of multiple OpenGL 4 core features, such as ARB_multi_draw_indirect, but also showcases OpenGL 3 style rendering techniques.

    There is also several techniques built around the NV_command_list extension. Please refer to gl_commandlist_basic for an introduction on NV_command_list.

    Note: This is just a sample to illustrates several techniques and possibilities how to approach rendering, its purpose is not to provide production level, highly optimized implementations.


    gl commandlist basic

    In this sample the NV_command_list extension is used to render a basic scene (variant of gl_simple_pipeline sample) and texturing is performed via ARB_bindless_texture.



    With the addition of indirect rendering (ARB_draw_indirect and ARB_multi_draw_indirect) OpenGL got an efficient mechanism that allows the GPU to create or modify its own work without stalling the pipeline. As CPU and GPU are best used when working asynchronously, avoiding readbacks to CPU to drive decision making is beneficial.

    In this sample we use ARB_draw_indirect and ARB_shader_atomic_counters to build three distinct render lists for drawing particles as spheres, each using a different shader and representing a different level of detail (LOD): Draw as point; Draw as instanced low resolution mesh; Draw as instanced adaptive tessellated mesh



    This sample implements a batched occlusion culling system, which is not based on individual occlusion queries anymore, but uses shaders to cull many boxes at once. The principle algorithms are also illustrated towards the end of the presentation slides of GTC 2014 and SIGGRAPH 2014 talk.

    It leverages the ARB_multi_draw_indirect (MDI) extension to implement latency-free occlusion culling. The MDI technique works well with a simplified scene setup where all geometry is stored in one big VBO/IBO pairing and no shader changes are done in between.

    The slides mention that this approach could be extended to use NV_bindless_multi_draw_indirect to render drawcalls using different VBO/IBOs in one go. With the upcoming NV_command_list however an even better approach is possible, which is also implemented in the sample and allows more flexible state changes. Please refer to gl_commandlist_basic for an introduction on NV_command_list.



    This sample shows how to use NVIDIA path rendering extension.

    It also exposes it through FBO (Frame-buffer-objects) and shows how to work on CMYK-Alpha format.



    This sample implements screen space ambient occlusion (SSAO) using horizon-based ambient occlusion (HBAO). You can find some details about HBAO here. It provides two alternative implementations the original hbao as well as an enhanced version that is more efficient in improved leveraging of the hardware's texture sampling cache, using de-interleaved texturing.


    OptiX Advanced Samples

    This is a set of advanced samples for the NVIDIA OptiX Ray Tracing Engine. They assume some previous experience with OptiX and physically based rendering, unlike the basic tutorial-style samples in the SDK directory of the OptiX 4.0 distribution. They also use some different libraries than the SDK samples; GLFW and imgui in place of GLUT, for example. This means you cannot generally copy one of the advanced samples directly into the SDK, and vice versa.


    NVIDIA GPU Blur Plug-in for Adobe After Effects

    The NVIDIA GPU Blur Plug-in performs fast blur for video editing in After Effects using CUDA. The plug-in is based on the CUDA Toolkit sample Box Filter, adapted to perform multiple iterations for high quality, and providing both a GPU pathway and CPU fallback. The sample also demonstrates how to do self-profiling, displaying a console window to give CPU and GPU timings. Performance is 12x faster that the single core CPU fallback. This sample provides a basis for developing additional plug-ins for After Effects using CUDA.

    Download the plug-in here.