Content Creation / Rendering

Building Spatial Intelligence from Real-World 3D Data Using Deep-Learning Framework fVDB

Jul 29, 2024

By Ken Museth, Robert Jensen, Jonathan Swartz, Francis Williams and Jiahui Huang

Discuss (0)

AI-Generated Summary

Dislike

NVIDIA's fVDB is a deep-learning framework designed for sparse, large-scale, and high-performance spatial intelligence, enabling the conversion of real-world data into AI-ready virtual representations.
fVDB combines OpenVDB with deep learning operators and NanoVDB to provide a unified API for differentiable building and training of neural networks, ray tracing, and rendering, as well as building sparse grids on the GPU.
fVDB enables spatial intelligence for various applications, including neural shape-reconstruction, city-scale digital twins with neural radiance fields, large-scale 3D generative AI, and physics super-resolution.

AI-generated content may summarize information incompletely. Verify important information. Learn more

Generative physical AI models can understand and execute actions with fine or gross motor skills within the physical world. Understanding and navigating in the 3D space of the physical world requires spatial intelligence. To achieve spatial intelligence in physical AI involves converting the real world into AI-ready virtual representations that the model can understand.

But building spatial intelligence from real-world data requires infrastructure that can handle the massive scale and high resolution of reality. Typically, developers have to piece together different libraries to build a framework for spatial intelligence. This patchwork approach often leads to bugs and inefficiencies, limiting the scope of the virtual environment. Without a unified framework, copying data between multiple data structures introduces performance bottlenecks, limited size, and unnecessary work.

To provide a powerful, coherent framework that can handle physical AI at reality scale, NVIDIA built fVDB, a deep-learning framework designed for sparse, large-scale, and high-performance spatial intelligence.

fVDB is a game-changer for practitioners and researchers working on deep-learning applications that involve large-scale 3D data, such as those typically associated with real-world simulations or measurements. Examples of such sparse large-scale 3D data include point clouds, radiance fields, physical quantities for simulations, signed distance functions, and LiDAR.

fVDB is so named because it uses OpenVDB to efficiently represent features fVDB combines deep learning operators with NanoVDB, the NVIDIA GPU-accelerated implementation of OpenVDB. The industry standard for efficient storage and simulation of sparse volumetric data, OpenVDB is open-sourced by the Academy Software Foundation and managed by a Technical Steering Committee chaired by NVIDIA’s Ken Museth.

Video 1. fVDB provides 3D deep-learning infrastructure for massive datasets and high resolutions

fVDB is an open-source extension to PyTorch that enables a complete set of deep-learning operations to be performed on large 3D data. Examples of these deep-learning operations are attention and convolution, which are fundamental building blocks in celebrated machine learning architectures like transformers, and convolution neural networks (CNNs). While they are traditionally implemented in 1D and 2D (in PyTorch and TensorFlow, for example), fVDB enables efficient implementations in 3D when applied to large sparse data sets.

Key capabilities include:

Compatibility with existing VDB datasets: fVDB can read and write existing VDB datasets out of the box. It interoperates with other libraries and tools, such as Warp for Pythonic spatial computing, and the Kaolin Library for 3D deep learning. Adopting fVDB into your existing AI workflow is seamless.
Unified API for differentiably
- Building and training neural networks (convolution, attention, pooling, and more)
- Ray tracing and rendering (ray marching, Gaussian splatting, volume rendering)
- Building sparse grids on the GPU (from points, meshes, coordinates, and so on)
- Sampling and splatting sparse volumes
- Processing non-uniform batches of data efficiently on the GPU
Faster and more scalable: fVDB enables 4x spatial scales and is 3.5x faster than prior frameworks.
More features: fVDB provides 10x more operators than prior frameworks. It provides easy-to-use APIs so you don’t have to patch together different libraries.

fVDB enables spatial intelligence for a variety of applications, including:

Neural shape-reconstruction from over 250 million 3D points
City-scale digital twins with neural radiance fields (NeRFs)
Large-scale 3D generative AI
Physics super-resolution, where neural networks are used to add high-resolution 3D detail to faster low-resolution simulations

fVDB applications

fVDB is already in use with the NVIDIA Research, NVIDIA DRIVE, and NVIDIA Omniverse teams as a framework to enable state-of-the-art results in spatial intelligence research and applications.

Surface reconstruction

Neural Kernel Surface Reconstruction (NKSR) implements a new algorithm for reconstructing high fidelity surfaces from large point clouds. NKSR is a large-scale kernel solver based on fVDB and neural kernels capable of reconstructing a high-fidelity surface spanning kilometers from 350 million points in 2 minutes on eight GPUs.

Video 2. fVDB is used to implement Neural Kernel Surface Reconstruction, a state-of-the-art method for reconstructing surfaces from point clouds

Generative AI

XCube combines diffusion generative models with sparse voxel hierarchies, able to generate scenes with an effective spatial resolution of 1024³ voxels in under 30 seconds. Built on fVDB, high resolutions are enabled by progressively subdividing the sparse voxel hierarchy. Generated voxels can contain rich attributes such as textures or semantics.

Video 3. A set of contiguous images is fed to an XCube-style network generating 3D fVDB grids

NeRFs

NeRF-XL is a principled algorithm for distributing NeRFs across multiple GPUs. NeRF-XL decomposes large scenes into smaller chunks distributed onto separate GPUs. It reformulates the training and rendering procedures so that multiple GPU training is mathematically equivalent to the classic single-GPU case. fVDB is the underlying framework that accelerates ray-marching in the neural rendering process and is parallelizable over multiple devices.

Video 4. fVDB helps NeRF-XL to efficiently scale multi-GPU NeRFs that span huge areas of many square kilometers

NVIDIA fVDB NIM microservices

Coming soon, fVDB functionality will be available as NVIDIA NIM microservices that enable developers to incorporate the fVDB core framework into Universal Scene Description (OpenUSD) workflows. fVDB NIM microservices generate OpenUSD-based geometry in NVIDIA Omniverse.

fVDB Mesh Generation NIM: Generates an OpenUSD-based mesh, rendered by Omniverse Cloud APIs, from point cloud data.
fVDB Physics Super-Res NIM: Performs AI super-resolution on a frame or sequence of frames to generate an OpenUSD-based high-resolution physics simulation.
fVDB NeRF-XL NIM: Generates large-scale NeRFs in OpenUSD using NVIDIA Omniverse Cloud APIs.

Learn more about how to integrate generative AI into your OpenUSD workflow using USD NIM microservices.

Conclusion

Developed by NVIDIA, fVDB is a deep-learning framework for sparse, large-scale, high-performance spatial intelligence. It builds NVIDIA-accelerated AI operators on top of OpenVDB to enable digital twins at reality scale, neural radiance fields, 3D generative AI, and more.

Apply for early access to fVDB, which includes access to the fVDB PyTorch extension.

Coming soon, you’ll be able to follow along with fVDB development through AcademySoftwareFoundation/openvdb on GitHub. While you wait, check out the pull request to merge fVDB. fVDB is expected to be merged into OpenVDB shortly.

Join us at SIGGRAPH 2024 for Introduction to fVDB: Hands-On With Large-Scale Spatial Intelligence, a workshop that introduces the concepts in fVDB with an interactive tutorial to get started.

To learn more, see the fVDB announcement from the Academy Software Foundation.

Discuss (0)

About the Authors

About Ken Museth
Ken Museth is a senior director in Simulation Technology and joined NVIDIA in early 2020 when he initiated the development of NanoVDB. He was previously Head of Research & Development in Simulations at Weta Digital, focusing on developing state-of-the-art VFX for the Avatar sequels. He is the creator of VDB and the lead architect of OpenVDB, and the chair of its Technical Steering Committee. Additionally, Ken worked six years for SpaceX on large-scale fluid dynamics simulations of the new Raptor rocket engine. Before joining Weta in 2017, he worked for a decade at DreamWorks Animation and Digital Domain, and prior to that for a decade as a researcher and full professor at Caltech and Linkoping University. He holds a PhD in quantum dynamics from Copenhagen University and has been awarded a Technical Achievement Award from The Academy of Motion Picture Arts and Sciences . Ken is on the SIGGRAPH 2020 Technical Paper Committee.

View all posts by Ken Museth

About Robert Jensen
Robbie is a product marketing manager at NVIDIA who is working to enable adoption of NVIDIA’s game development SDKs, especially Nsight Developer Tools. He holds a bachelor’s degree in Computer Science from Connecticut College.

View all posts by Robert Jensen

About Jonathan Swartz
Jonathan Swartz is a research scientist in the High-Fidelity Physics Research group at NVIDIA and is based in Wellington, Aotearoa, New Zealand. Before joining NVIDIA, Jonathan spent 16 years in the visual effects and animation industry in various research, engineering, and craftsperson roles creating simulated effects, images, and technology for over two dozen films including the Avatar films, The Hobbit trilogy, the Avengers films and Cloudy with a Chance of Meatballs.

View all posts by Jonathan Swartz

About Francis Williams
Francis Williams is a research scientist at NVIDIA working at the intersection of computer vision, machine learning, and computer graphics. His research is a mix of theory and application, aiming to solve practical problems in elegant ways. In particular, he is very interested in 3D shape representations which can enable deep learning on “real-world” geometric datasets which are often noisy, unlabeled, and consisting of very large inputs. He completed his Ph.D. in 2021 at NYU, where he worked in the Math and Data Group and the Geometric Computing Lab. He is also the creator and maintainer of several open source projects, including NumpyEigen, Point Cloud Utils, and FML.

View all posts by Francis Williams

About Jiahui Huang
Jiahui Huang is a research intern at NVIDIA Toronto AI Lab. His research focuses on dynamic perception and 3D reconstruction using SLAM and deep learning.

View all posts by Jiahui Huang