Content Creation / Rendering

Building Spatial Intelligence from Real-World 3D Data Using Deep-Learning Framework fVDB

Generative physical AI models can understand and execute actions with fine or gross motor skills within the physical world. Understanding and navigating in the 3D space of the physical world requires spatial intelligence. To achieve spatial intelligence in physical AI involves converting the real world into AI-ready virtual representations that the model can understand. 

But building spatial intelligence from real-world data requires infrastructure that can handle the massive scale and high resolution of reality. Typically, developers have to piece together different libraries to build a framework for spatial intelligence. This patchwork approach often leads to bugs and inefficiencies, limiting the scope of the virtual environment. Without a unified framework, copying data between multiple data structures introduces performance bottlenecks, limited size, and unnecessary work. 

To provide a powerful, coherent framework that can handle physical AI at reality scale, NVIDIA built fVDB, a deep-learning framework designed for sparse, large-scale, and high-performance spatial intelligence. 

fVDB is a game-changer for practitioners and researchers working on deep-learning applications that involve large-scale 3D data, such as those typically associated with real-world simulations or measurements. Examples of such sparse large-scale 3D data include point clouds, radiance fields, physical quantities for simulations, signed distance functions, and LiDAR. 

fVDB is so named because it uses OpenVDB to efficiently represent features fVDB combines deep learning operators with NanoVDB, the NVIDIA GPU-accelerated implementation of OpenVDB. The industry standard for efficient storage and simulation of sparse volumetric data, OpenVDB is open-sourced by the Academy Software Foundation and managed by a Technical Steering Committee chaired by NVIDIA’s Ken Museth.

Video 1. fVDB provides 3D deep-learning infrastructure for massive datasets and high resolutions

fVDB is an open-source extension to PyTorch that enables a complete set of deep-learning operations to be performed on large 3D data. Examples of these deep-learning operations are attention and convolution, which are fundamental building blocks in celebrated machine learning architectures like transformers, and convolution neural networks (CNNs). While they are traditionally implemented in 1D and 2D (in PyTorch and TensorFlow, for example), fVDB enables efficient implementations in 3D when applied to large sparse data sets.

Key capabilities include: 

  • Compatibility with existing VDB datasets: fVDB can read and write existing VDB datasets out of the box. It interoperates with other libraries and tools, such as Warp for Pythonic spatial computing, and the Kaolin Library for 3D deep learning. Adopting fVDB into your existing AI workflow is seamless.
  • Unified API for differentiably
    • Building and training neural networks (convolution, attention, pooling, and more)
    • Ray tracing and rendering (ray marching, Gaussian splatting, volume rendering)
    • Building sparse grids on the GPU (from points, meshes, coordinates, and so on)
    • Sampling and splatting sparse volumes
    • Processing non-uniform batches of data efficiently on the GPU 
  • Faster and more scalable: fVDB enables 4x spatial scales and is 3.5x faster than prior frameworks. 
  • More features: fVDB provides 10x more operators than prior frameworks. It provides easy-to-use APIs so you don’t have to patch together different libraries. 

fVDB enables spatial intelligence for a variety of applications, including:

  • Neural shape-reconstruction from over 250 million 3D points 
  • City-scale digital twins with neural radiance fields (NeRFs)
  • Large-scale 3D generative AI
  • Physics super-resolution, where neural networks are used to add high-resolution 3D detail to faster low-resolution simulations

fVDB applications

fVDB is already in use with the NVIDIA Research, NVIDIA DRIVE, and NVIDIA Omniverse teams as a framework to enable state-of-the-art results in spatial intelligence research and applications.

Surface reconstruction

Neural Kernel Surface Reconstruction (NKSR) implements a new algorithm for reconstructing high fidelity surfaces from large point clouds.  NKSR is a large-scale kernel solver based on fVDB and neural kernels capable of reconstructing a high-fidelity surface spanning kilometers from 350 million points in 2 minutes on eight GPUs.

Video 2. fVDB is used to implement Neural Kernel Surface Reconstruction, a state-of-the-art method for reconstructing surfaces from point clouds

Generative AI 

XCube combines diffusion generative models with sparse voxel hierarchies, able to generate scenes with an effective spatial resolution of 10243 voxels in under 30 seconds. Built on fVDB, high resolutions are enabled by progressively subdividing the sparse voxel hierarchy. Generated voxels can contain rich attributes such as textures or semantics.

Video 3. A set of contiguous images is fed to an XCube-style network generating 3D fVDB grids

NeRFs 

NeRF-XL is a principled algorithm for distributing NeRFs across multiple GPUs. NeRF-XL decomposes large scenes into smaller chunks distributed onto separate GPUs. It reformulates the training and rendering procedures so that multiple GPU training is mathematically equivalent to the classic single-GPU case. fVDB is the underlying framework that accelerates ray-marching in the neural rendering process and is parallelizable over multiple devices.

Video 4. fVDB helps NeRF-XL to efficiently scale multi-GPU NeRFs that span huge areas of many square kilometers

NVIDIA fVDB NIM microservices

Coming soon, fVDB functionality will be available as NVIDIA NIM microservices that enable developers to incorporate the fVDB core framework into Universal Scene Description (OpenUSD) workflows. fVDB NIM microservices generate OpenUSD-based geometry in NVIDIA Omniverse.

  • fVDB Mesh Generation NIM: Generates an OpenUSD-based mesh, rendered by Omniverse Cloud APIs, from point cloud data. 
  • fVDB Physics Super-Res NIM: Performs AI super-resolution on a frame or sequence of frames to generate an OpenUSD-based high-resolution physics simulation. 
  • fVDB NeRF-XL NIM: Generates large-scale NeRFs in OpenUSD using NVIDIA Omniverse Cloud APIs.

Learn more about how to integrate generative AI into your OpenUSD workflow using USD NIM microservices.

Conclusion

Developed by NVIDIA, fVDB is a deep-learning framework for sparse, large-scale, high-performance spatial intelligence. It builds NVIDIA-accelerated AI operators on top of OpenVDB to enable digital twins at reality scale, neural radiance fields, 3D generative AI, and more.

Apply for early access to fVDB, which includes access to the fVDB PyTorch extension. 

Coming soon, you’ll be able to follow along with fVDB development through AcademySoftwareFoundation/openvdb on GitHub. While you wait, check out the pull request to merge fVDB. fVDB is expected to be merged into OpenVDB shortly.  

Join us at SIGGRAPH 2024 for Introduction to fVDB: Hands-On With Large-Scale Spatial Intelligence, a workshop that introduces the concepts in fVDB with an interactive tutorial to get started.

To learn more, see the fVDB announcement from the Academy Software Foundation.

Discuss (0)

Tags