VRWorks Audio SDK in-depth

Introduction

Proliferation of VR use-cases in gaming and professional visualization is putting increasing demands on realism and user immersion. This is particularly true in the gaming world, wherein absence of subtle environmental clues can make a significant difference to the gamer’s interest in the game and hence the game’s success.

In order to get a fully immersive user experience in VR, two of the most important human senses, visual and auditory, need to be in sync with the virtual environment being modeled. With recent advancements in hardware accelerated features in NVIDIA GPUs, the “visual fidelity” of the rendered graphics has improved dramatically, and continues to improve further with every hardware and software release.

Development of auditory modeling in VR has been a topic which has been discussed in literature for several years. The concept of path-traced/geometric audio has been around since 1980’s but practical implementation of the concept was limited to academics and research-oriented explorations, mainly due to unavailability of computational resources for satisfying real-time requirements. With advances in the GPU architecture, density and speed, performing acoustic path-tracing in real time, without significantly affecting graphics/visual rendering, is now possible. NVIDIA VRWorks Audio technology takes advantage of these advances and provides a complete hardware-accelerated solution for creating an immersive acoustic experience in any application, based on the geometry/model of the virtual environment. The combination of NVIDIA GPU hardware and software constitutes the ideal platform to achieve this because:

By design, the GPU is best-suited for performing highly parallel computations such as acoustic path-tracing within the environment model

The GPU graphics driver has access to elements within the environment’s “geometry” which is being modeled for graphical rendering purposes. This makes it much more efficient for the VRWorks Audio library to work with hardware, improving performance by leaps and bounds.

NVIDIA VRWorks Audio SDK takes advantage of and complements other NVIDIA VR technologies to enable a GPU to give the user the sensation of true presence in virtual worlds.

The core technology of the NVIDIA VRWorks Audio SDK is a geometric acoustics ray-tracing engine, called NVIDIA Acoustic Raytracer (NVAR). This technology heightens the sense of realism in an interactive environment.

Background

Sound travels at a speed of approximately 340 m/s in air. It is well-understood that sound propagates through air as wavefronts. Due to this, every obstacle encountered by the sound not only reflects but also absorbs, transmits, diffracts, scatters and disperses the sound wave.

Most of the audio effect-processing in games today, therefore, involves applying some type of a predetermined filter to the audio source waveform. Audio effects generated using such predetermined filters are called parametric effects. For example, one can acoustically model reverberations in the main hall of a large cathedral as a filter with a very long impulse response and exponentially decaying taps. In modeling acoustical properties of the environment, it is critical to model direct path and early reflections. Direct path is the shortest unoccluded path between the source and the listener. Early reflections are the paths traced by reflected/diffracted/scattered wavefronts, and eventually arriving at the listener within the first ~200 ms after the direct path.Both direct path and early reflections have higher energy compared to other reverberations generated by the environment, making them easily noticeable by our auditory senses. Direct path and early reflections are critical to our brain to construct an “acoustic visualization” of the scene, as they give direct clues about distance between the listener and large objects around him/her.

Some of the contemporary 3D audio solutions attempt to model such early reflections by taking into account the position of the listener within the geometry and then adding high energy “peaks” to the filter at delays corresponding to those positions. Since such parametric effect filters are still precomputed, they do not work well if the listener or the source move within the scene. Further, getting the “sound” match with the graphically rendered scene requires a lot of efforts, iterations and is a time-consuming process.

Instead of adding energy at certain predetermined early reflection positions in the filter, it is possible to compute the early reflections within the application, taking into account source and listener coordinates. Some recent 3D audio solutions use a hybrid approach and combine such CPU-computed early reflections with precomputed parametric reverbs to construct an approximate acoustic model of the environment. However, this approach requires significant CPU power, possibly blocking CPU from processing critical game logic and making the game unresponsive.

Today, most of the games implement positional audio (i.e. directionality) using HRTFs (head related transfer functions) and environmental effects using parametric reverbs. Directionality is a well-understood problem and HRTFs work reasonably well in most cases. However, getting good static parametric reverbs can be challenging and making them sound good in a dynamic environment (responsive to the environmental changes, orientation, location etc.) is almost impossible without spending unreasonably large CPU computational power.

NVIDIA VRWorks Audio is the only fully hardware-accelerated and path-traced audio solution which creates a complete acoustic image of the environment in real-time without requiring any predetermined filters. VRWorks Audio library does not require any “pre-baked” knowledge of the scene. As the scene is loaded by the application, the acoustic model is built and updated on-the-fly; and audio effect filters are generated and applied in real time on the sound source waveforms. This approach gives tremendous time-savings to the audio designers and engineers, because it allows them to focus on designing the soundscapes rather than thinking about how to render it well. Rendering is automatically taken care by NVAR library.

For example, a typical game level with a large building with multiple rooms and architectural features will require fine tuning of audio effects in each of these rooms. Generating these effects accurately is an iterative process and requires several man-weeks’ effort. With VRWorks Audio, this time reduces to zero.

Technology

As discussed in the previous section, creating an auditory immersive environment requires modeling the sound propagation phenomena such as reflection, diffraction, scattering, etc. NVIDIA VRWorks Audio technology approximates most of these phenomena using inherently parallel computational capability of the GPU. Written using CUDA and NVIDIA OptiX Ray Tracing Engine, NVAR library builds an acoustic model of the environment passed to it and constructs filters which represent the environment, in real time. The filters take into account not only the structure of the environment (also referred to as the geometry), but also material properties of the constituent elements within that geometry, source and listener positions and their orientations. For each pair of source and listener, the library builds a pair of high-resolution (up to 48000 Hz), long (up to 2 seconds) filters. The two filters model how the sound from the particular source will be heard by the listener’s left and right ears at that position and orientation. These per-source filters are then applied to the individual source sounds to yield the processed, “wet” audio which includes all the environmental effects, including directionality, reverberation, occlusion, attenuation, diffraction, transmission etc. In literature, such environmental filters generated in real-time, are referred to as convolution filters (in contrast with parametric filters).

NVIDIA VRWorks Audio SDK

NVIDIA VRWorks Audio SDK consists of a library, set of APIs, sample applications and documentation designed for application developers who wish to have immersive audio in their applications. All APIs are designed as standard C-APIs.

NVIDIA is also releasing a UE4 game engine plugin which facilitates easy integration of this technology into games which use UE4. Plugins for other game engines and audio middleware are under development and will be released later this year.

Using VRWorks Audio in an Application

This section provides an overview of how to integrate VRWorks Audio technology into an application/game using the C-API.

In general, programming NVAR is accomplished in three stages:

Initialization and setup
Real-time filter generation and effect processing (in main loop)
Clean-up

The graphic below illustrates general application flow when using VRWorks Audio APIs. The sample code shows the relevant API’s to be used in each stage of the application.

In the main application loop (indicated by the label “Game Loop” below), the application repeatedly generates 2 filters for each source/listener pair. These filters represent the listener’s acoustic environment with respect to the particular source. VRWorks Audio library provides APIs to either read the filter impulse response for further processing by the application or perform filtering on the audio frames via an optimized filter implementation and provide processed audio.

Application Flow

Application Flow - API Reference

          
          Global Init:
                       nvarTnitialize ();
                       nvarCreate ();
          Global Setup:
                       nvarCreateMaterials ();
                       nvarCreateMesh ();
          Game Loop:
              Setup:
                       nvarCreateSource ();       // Add sources (when applicable)
                       nvarSetSource ();          // Change source position and orientation          
                       nvarSetListener ();        // Change listener position and orientation
                       nvarTransformMesh ();      // Move geometry within the scene (where applicable)          
            Filter Gen:
                       nvarTraceAudio ();         // Generate Filters
            Filter Apply:
                       nvarApplySourceFilters (); // Apply effects to the audio stream
            Cleanup:
                       nvarDestroySource ();      // Terminate a source (when applicable)
          Global Cleanup:
                       nvarDestroyMesh ();
                       nvarDestroy ();
                       nvarFinalize ();

Within the main loop, the application can update the source/listener positions, orientations, add/modify/delete geometry objects dynamically and NVAR will incorporate these changes while calculating the next set of filters being generated.

Audio Path-tracing

Following diagram illustrates how NVAR traces audio paths through the geometry to build an acoustic model of the environment.

The green elements in the geometry are various meshes/objects within the geometry, which are fed into the library as a part of the application initialization. NVAR supports dynamically changing geometry, such that any of the elements can be added/deleted/modified/transformed at run-time and the changes will take effect immediately. When creating the geometry meshes, the application specifies materials for each of the objects/meshes. The API has a library of predefined materials which can be used as is or the developer can define custom audio materials.

In the example above, two audio sources are being modeled (Source1 and Source2). The GPU traces various paths between each source and the listener. Three such paths (one direct and two indirect) between Source1 and listener are shown in the diagram above. Note that although the diagram shows reflected paths only, the library also models other effects such as occlusion, diffraction etc. since they are important for audio. Thousands of such paths are traced in real-time between each source/listener pair to generate left and right ear filters per source/listener pair.

Presets

NVIDIA VRWorks Audio API uses several parameters which control the strength of audio effects generated. In order to provide various levels of effects, the library exposes three effect presets which control the strength of the effects being modeled (Low, Medium, High). The developer has control over changing some of the parameters in these presets via the exposed APIs. The effect strength presets and these API’s allow the developer to achieve any desired acoustic experience as per his/her needs. The effect strength can be controlled on a per source basis; i.e. each source can have different effects applied, if necessary. Figure 3 shows a room impulse response generated by VRWorks Audio for a mid-size room in low and high effect strength preset.

Illustrative filter impulse response with low and high effect preset generated by VRWorks

In addition to the effect strength presets, the library also exposes two computational complexity presets: Low compute and High compute. The compute presets adjust the internal parameters depending upon the GPU computational resources available for NVAR processing. For example, the application should specify low-compute preset, while running with a low-end GPU, and set high-compute preset while running with a high-end GPU or a multi-GPU environment. Thus, the compute presets enable scalability across wide range of system configurations while minimizing performance impact.

Unreal Engine 4 Plugin (based on UE4 4.15)

NVIDIA VRWorks Audio SDK also includes a plugin for the Unreal Engine 4 (UE4). The plugin allows a game developer to quickly add VRWorks Audio support to the game using the UE4 editor. Figure 4 shows the architecture for the Unreal Engine 4 plugin for VRWorks Audio.

Architecture of Unreal Engine 4 plugin for VRWorks Audio

To use VRWorks Audio in UE4-based application, one needs to configure the geometry and audio sources appropriately. Geometry configuration tells VRWorks Audio library which geometry meshes are to be used (are visible) for audio simulation. Typically, walls, floors, ceilings, furniture, doors are all examples of meshes which should be made visible to VRWorks Audio. Along with visibility, one can also specify the material properties associated with the mesh. Audio source configuration tells VRWorks Audio library about the audio properties of the particular sound source. The audio properties currently available include the strength of the sound, and direct/indirect path gains.

VRWorks Audio in Action

Although the current VRWorks Audio technology has been designed with VR games in mind, it can be used in many applications. For example,

Optis has integrated VRWorks audio into their HIM solution, which is a product used for 3D virtual prototyping. You can find a demo of VRWorks Audio in action at Optis booth at GPU Technology Conference in San Jose, May 8-11, 2017
NVIDIA has integrated VRWorks Audio technology into a UE4-based game, Unreal Tournament, using the UE4 plugin for VRWorks Audio.
VRWorks Audio SDK includes a sample application which loads the geometry of a palace and demonstrates modeling of the audio effects within this geometry.