NVIDIA Research: Learning Modular Scene Representations With Neural Scene Graphs

NVIDIA researchers will present their paper “Neural Scene Graph Rendering” at SIGGRAPH 2021, August 9-13, which introduces a neural scene representation inspired by traditional graphics scene graphs.

Recent advances in neural rendering have pushed the boundaries of photorealistic rendering; take StyleGAN as an example of producing realistic images of fictional people. The next big challenge is bringing these neural techniques into digital content-creation applications, like Maya and Blender. This challenge requires a new generation of neural scene models that feature artistic control and modularity that is comparable to classical 3D meshes and material representations.

“In order to kick-off these developments, we needed to step back a little bit and scale down the scene complexity,” mentions Jonathan Granskog, the first author of the paper.

This is one of the reasons why the images in the paper are reminiscent of early years of computer graphics. However, the artistic control and the granularity of neural elements is closer to what modern applications would require to integrate neural rendering into traditional authoring pipelines. The proposed approach allows organizing learned neural elements into an (animated) scene graph much like in standard authoring tools.

2D animation with neural shapes morphing between a number of tangram assemblies.

A neural element may represent, for instance, the geometry of a teapot or the appearance of porcelain. Each such scene element is stored as an abstract, high-dimensional vector with its parameters being learned from images. During the training process, the method also learns how to manipulate and render these abstract vectors. For instance, a vector representing a piece of geometry can be translated, rotated, bent, or twisted using a manipulator. Analogously, material elements can be altered by stretching the texture content, desaturating it, or changing the hue.

2D sprite animation where the geometry elements of the scene graph feature neural textures.

Since the optimizable components (vectors, manipulators, and the renderer) are very general, the approach can handle both 2D and 3D scenes without changing the methodology. The artist can compose a scene by organizing the vectors and manipulators into a scene graph. The scene graph is then collapsed into a stream of neural primitives that are translated into an RGB image using a streaming neural renderer, much like a rasterizer would turn a stream of triangles into an image.

3D animation with deforming neural shapes.

The analogy to the traditional scene graphs and rendering pipelines is not coincidental.

“Our goal is to eventually combine neural and classical scene primitives, and bringing the representations closer to each other is the first step on that path,” says Jan Novák, a co-author of the paper.

This will unlock the possibility of extracting scene elements from photographs using AI algorithms, combining them with classical graphics representations, and composing scenes and animations in a controlled manner.

The animations on this page illustrate the potential. The individual neural elements were learned from images of random static scenes. An artist then defined a sequence of scene graphs to produce a fluent animation consisting of the learned elements. While there is still a long way to go to achieve high-quality visuals and scene complexity of modern applications with this approach, the article presents a feasible approach for bringing neural and classical rendering together. Once these fully join forces, real-time photorealistic rendering could experience the next quantum leap.

Learn more: Check out the project website.