By Martin-Karl Lefrançois and Pascal Gautron # NVIDIA Vulkan Ray Tracing Helpers: Introduction [//]: # This may be the most platform independent comment The [Vulkan Ray Tracing Tutorial](vkrt_tutorial.md.htm) uses a small number of helper classes to bridge the gap between the principles of RTX and the actual implementation. These helpers make heavy use of the STL and modern C++ to reduce the code size to a minimum. The entirety of the helper code is present in this page without any more dependency than Vulkan headers and the STL. The classes are completely independent from each other, and the helper methods directly contain the Vulkan code without further indirections. This document describes the contents of such helpers, which have been designed to be usable either as is, or to provide code that can be easily extracted and integrated in existing applications. As such, the helpers contain the minumum amount of data, and leave it to the user to manage GPU activity: in particular, the helpers do not perform any GPU memory allocations. Since every application may have its own memory management, the helpers do not use any smart pointers, leaving the responsibility of pointer management to the application. This document aims at providing information on the underlying helpers of the tutorial, and does not claim to document the `VK_NV_ray_tracing` extension specification and usage exhaustively. Each section can be read independently, hence some repetitions can be found from a section to another. The source files of the helper classes can be found here: [vkrayHelpers.zip](/rtx/raytracing/vkrt_helpers/files/vkrayHelpers.zip) # Quick reference * [`BottomLevelASGenerator`](#toc3): Generating the bottom-level acceleration structure (BLAS) * [`AddVertexBuffer`](#toc3.1) Add a vertex buffer to the geometry of the BLAS * [`ComputeASBufferSizes`](#toc3.2): Compute the amount of memory required to build the BLAS * [`Generate`](#toc3.3): Build the BLAS and stores it into a user-provided buffer * [`TopLevelASGenerator`](#toc4): Create and hold the acceleration structure of the scene * [`AddInstance`](#toc4.1): Add an instance to the acceleration structure * [`ComputeASBufferSizes`](#toc4.2): Compute the memory requirements to build the TLAS * [`Generate`](#toc4.3): Generate and store TLAS * [`DescriptorSetGenerator`](#toc5): Simple generation of descriptor sets * [`AddRangeParameter`](#toc5.1) Add a reference to a range of views within the active heap * [`AddHeapRangesParameter`](#toc5.2) Add an explicit reference to a buffer or constants * [`Generate`](#toc5.3) Generate the root signature from the parameters * [`RayTracingPipelineGenerator`](#toc6): Assembling components to generate the ray tracing pipeline * [`AddLibrary`](#toc6.5) Add a DXIL library representing a shader program * [`AddHitGroup`](#toc6.6) Combine intersection, any hit and closest hit programs into a hit group * [`AddRootSignatureAssociation`](#toc6.6) Associate programs or hit groups to a root signature * [`SetMaxPayloadSize`, `SetMaxAttributeSize`, `SetMaxRecursionDepth`](#toc6.7) Set the global pipeline properties * [`Generate`](#toc6.11) Create the pipeline subobjects and Generate the ray tracing pipeline * [`ShaderBindingTableGenerator`](#toc7): Constructing the SBT associating geometry and shaders * [`AddRayGenerationProgram`](#toc7.2) Add a ray generation program and its resource pointers * [`AddMissProgram`](#toc7.3) Add a miss program and its resource pointers * [`AddHitGroup`](#toc7.4) Add a hit group and its resource pointers * [`Generate`](#toc7.7) Add a ray generation program and its resource pointers * [`Reset`](#toc7.9) Remove all program and hit groups references from the SBT * [`Getters`](#toc7.10) Access the size of the entries and SBT sections to facilitate the `DispatchRays` setup # Bottom-Level Acceleration Structure The `BottomLevelAS` class facilitates setting up the geometry to be used as input of the bottom-level acceleration structure (BLAS) builder. This bottom-level hierarchy is used to store the triangle data in a way suitable for fast ray-triangle intersection at runtime. To be built, this data structure requires some scratch space which has to be allocated by the application. Similarly, the resulting data structure is stored in an application-controlled buffer. To be used, the application must first add all the vertex buffers to be contained in the final structure, using AddVertexBuffer. After all buffers have been added, ComputeASBufferSizes will prepare the build, and provide the required sizes for the scratch data and the final result. The Generate call will finally compute the acceleration structure and store it in the result buffer. Note that the build is enqueued in the command list, meaning that the scratch buffer needs to be kept until the command list execution is finished. Here is an example usage: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ C // Add the vertex buffers (geometry) BottomLevelAS bottomLevelAS; bottomLevelAS.AddVertexBuffer(vertexBuffer1, 0, vertexCount1, sizeof(Vertex), transformBuffer1, 0); bottomLevelAS.AddVertexBuffer(vertexBuffer2, 0, vertexCount2, sizeof(Vertex), transformBuffer2, 0); ... VkAccelerationStructureNV structure = bottomLevelAS.CreateAccelerationStructure(VkCtx.getDevice(), false); // Find the size for the buffers UINT64 scratchSizeInBytes = 0; UINT64 resultSizeInBytes = 0; bottomLevelAS.ComputeASBufferSizes(device, structure, &scratchSizeInBytes, &resultSizeInBytes); AccelerationStructureBuffers buffers; buffers.scratchBuffer = nv_helpers_vk::CreateBuffer(..., scratchSizeInBytes, ...); buffers.resultBuffer = nv_helpers_vk::CreateBuffer(..., resultSizeInBytes, ...); // Generate acceleration structure bottomLevelAS.Generate(device, commandBuffer, structure, buffers.scratchBuffer, buffers.resultBuffer, false, nullptr); return buffers; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This class contains a few members: the vector of geometry descriptors, the scratch and storage memory computed by `ComputeASBufferSizes`, and a flag indicating whether the geometry is dynamic or not. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /// Vertex buffer descriptors used to generate the AS std::vector m_vertexBuffers = {}; /// Amount of temporary memory required by the builder VkDeviceSize m_scratchSizeInBytes = 0; /// Amount of memory required to store the AS VkDeviceSize m_resultSizeInBytes = 0; /// Flags for the builder, specifying whether to allow iterative updates, or /// when to perform an update VkBuildAccelerationStructureFlagsNV m_flags; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## AddVertexBuffer The `AddVertexBuffer` method adds a vertex buffer along with its index buffer in GPU memory into the acceleration structure. The vertices are supposed to be represented by 3 float32 values. At this stage, the method creates a `VkGeometryNV` descriptor for the geometry, and adds it to the vector of geometries to combine within the BLAS. Note that when adding geometry to the BLAS it is possible to pass a `transformBuffer`, which will contain a 4x4 transform matrix located at `transformOffsetInBytes`. This allows the application to combine multiple objects within a single BLAS, which is particularly useful to optimize performance on the static parts of the scene. If not provided, an identity matrix is assumed. This implementation limits the original flexibility of the API: * No custom intersector support, only triangles * Vertex positions are in a 3xfloat32 format * Indices are 32-bit values ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Add a vertex buffer along with its index buffer in GPU memory into the // acceleration structure. The vertices are supposed to be represented by 3 // float32 value. This implementation limits the original flexibility of the // API: // - triangles (no custom intersector support) // - 3xfloat32 format // - 32-bit indices void BottomLevelASGenerator::AddVertexBuffer( VkBuffer vertexBuffer, // Buffer containing the vertex coordinates, // possibly interleaved with other vertex data VkDeviceSize vertexOffsetInBytes, // Offset of the first vertex in the vertex buffer uint32_t vertexCount, // Number of vertices to consider in the buffer VkDeviceSize vertexSizeInBytes, // Size of a vertex including all its other data, // used to stride in the buffer VkBuffer indexBuffer, // Buffer containing the vertex indices // describing the triangles VkDeviceSize indexOffsetInBytes, // Offset of the first index in the index buffer uint32_t indexCount, // Number of indices to consider in the buffer VkBuffer transformBuffer, // Buffer containing a 4x4 transform matrix // in GPU memory, to be applied to the // vertices. This buffer cannot be nullptr VkDeviceSize transformOffsetInBytes, // Offset of the transform matrix in the // transform buffer bool isOpaque /* = true */ // If true, the geometry is considered opaque, optimizing the search // for a closest hit ) { VkGeometryNV geometry; geometry.sType = VK_STRUCTURE_TYPE_GEOMETRY_NV; geometry.pNext = nullptr; geometry.geometryType = VK_GEOMETRY_TYPE_TRIANGLES_NV; geometry.geometry.triangles.sType = VK_STRUCTURE_TYPE_GEOMETRY_TRIANGLES_NV; geometry.geometry.triangles.pNext = nullptr; geometry.geometry.triangles.vertexData = vertexBuffer; geometry.geometry.triangles.vertexOffset = vertexOffsetInBytes; geometry.geometry.triangles.vertexCount = vertexCount; geometry.geometry.triangles.vertexStride = vertexSizeInBytes; // Limitation to 3xfloat32 for vertices geometry.geometry.triangles.vertexFormat = VK_FORMAT_R32G32B32_SFLOAT; geometry.geometry.triangles.indexData = indexBuffer; geometry.geometry.triangles.indexOffset = indexOffsetInBytes; geometry.geometry.triangles.indexCount = indexCount; // Limitation to 32-bit indices geometry.geometry.triangles.indexType = indexBuffer != VK_NULL_HANDLE ? VK_INDEX_TYPE_UINT32 : VK_INDEX_TYPE_NONE_NV; geometry.geometry.triangles.transformData = transformBuffer; geometry.geometry.triangles.transformOffset = transformOffsetInBytes; geometry.geometry.aabbs = {VK_STRUCTURE_TYPE_GEOMETRY_AABB_NV}; geometry.flags = isOpaque ? VK_GEOMETRY_OPAQUE_BIT_NV : 0; m_vertexBuffers.push_back(geometry); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## CreateAccelerationStructure To be able to build an acceleration structure holding the geometry, we first need to create a handle for it. That handle, created by `CreateAccelerationStructure`, only requires a flag indicating whether the acceleration structure will support dynamic updates, so that the builder can later optimize the structure accordingly. Within the method, the construction of the handle also requires the knowledge of the number of objects to insert in the BLAS. Therefore, `CreateAccelerationStructure` must be called after all geometry has been added. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Create the opaque acceleration structure descriptor, which will be used in the estimation of // the AS size and the generation itself. The allowUpdate flag indicates if the AS will need // dynamic refitting. This has to be called after adding all the geometry. VkAccelerationStructureNV BottomLevelASGenerator::CreateAccelerationStructure(VkDevice device, VkBool32 allowUpdate) { // The generated AS can support iterative updates. This may change the final // size of the AS as well as the temporary memory requirements, and hence has // to be set before the actual build m_flags = allowUpdate ? VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_UPDATE_BIT_NV : 0; // Create the descriptor of the acceleration structure, which contains the number of geometry // descriptors it will contain VkAccelerationStructureInfoNV accelerationStructureInfo{ VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_INFO_NV}; accelerationStructureInfo.type = VK_ACCELERATION_STRUCTURE_TYPE_BOTTOM_LEVEL_NV; accelerationStructureInfo.flags = m_flags; accelerationStructureInfo.instanceCount = 0; // The bottom-level AS can only contain explicit geometry, and no instances accelerationStructureInfo.geometryCount = static_cast(m_vertexBuffers.size()); accelerationStructureInfo.pGeometries = m_vertexBuffers.data(); VkAccelerationStructureCreateInfoNV accelerationStructureCreateInfo{ VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_CREATE_INFO_NV}; accelerationStructureCreateInfo.pNext = nullptr; accelerationStructureCreateInfo.info = accelerationStructureInfo; accelerationStructureCreateInfo.compactedSize = 0; VkAccelerationStructureNV accelerationStructure; VkResult code = vkCreateAccelerationStructureNV(device, &accelerationStructureCreateInfo, nullptr, &accelerationStructure); if(code != VK_SUCCESS) { throw std::logic_error("vkCreateAccelerationStructureNV failed"); } return accelerationStructure; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## ComputeASBufferSizes Once all the geometry has been added to the vector of geometry descriptors, we need to estimate two amounts of memory required to build the BLAS: the size of the scratch space, which is used as temporary storage during the build, and the size of the actual BLAS. This method returns both values, so that the application can allocate the appropriate amounts of memory. The description of the work to be performed by the builder is provided in the `VkAccelerationStructureNV` structure. This information is then passed to `vkGetAccelerationStructureMemoryRequirementsNV` and `vkGetAccelerationStructureScratchMemoryRequirementsNV`, which provides the required amounts of storage and scratch memory, respectively. The required sizes are returned so that the application can allocate the buffers before calling the BLAS builder. Note that the ray tracing API makes a distinction between scratch buffer requirements for build vs. update operations. For simplicity, our helper implementation return a single scratch buffer size, which is the greater of build and update scratch buffer sizes. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Compute the size of the scratch space required to build the acceleration // structure, as well as the size of the resulting structure. The allocation of // the buffers is then left to the application void BottomLevelASGenerator::ComputeASBufferSizes( VkDevice device, // Device on which the build will be performed VkAccelerationStructureNV accelerationStructure, VkDeviceSize* scratchSizeInBytes, // Required scratch memory on the GPU to build // the acceleration structure VkDeviceSize* resultSizeInBytes // Required GPU memory to store the acceleration // structure ) { // Create a descriptor for the memory requirements, and provide the acceleration structure // descriptor VkAccelerationStructureMemoryRequirementsInfoNV memoryRequirementsInfo; memoryRequirementsInfo.sType = VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_MEMORY_REQUIREMENTS_INFO_NV; memoryRequirementsInfo.pNext = nullptr; memoryRequirementsInfo.accelerationStructure = accelerationStructure; memoryRequirementsInfo.type = VK_ACCELERATION_STRUCTURE_MEMORY_REQUIREMENTS_TYPE_OBJECT_NV; // This descriptor already contains the geometry info, so we can directly compute the estimated AS // size and required scratch memory VkMemoryRequirements2 memoryRequirements; vkGetAccelerationStructureMemoryRequirementsNV(device, &memoryRequirementsInfo, &memoryRequirements); // Size of the resulting AS m_resultSizeInBytes = memoryRequirements.memoryRequirements.size; // Store the memory requirements for use during build/update memoryRequirementsInfo.type = VK_ACCELERATION_STRUCTURE_MEMORY_REQUIREMENTS_TYPE_BUILD_SCRATCH_NV; vkGetAccelerationStructureMemoryRequirementsNV(device, &memoryRequirementsInfo, &memoryRequirements); m_scratchSizeInBytes = memoryRequirements.memoryRequirements.size; memoryRequirementsInfo.type = VK_ACCELERATION_STRUCTURE_MEMORY_REQUIREMENTS_TYPE_UPDATE_SCRATCH_NV; vkGetAccelerationStructureMemoryRequirementsNV(device, &memoryRequirementsInfo, &memoryRequirements); m_scratchSizeInBytes = m_scratchSizeInBytes > memoryRequirements.memoryRequirements.size ? m_scratchSizeInBytes : memoryRequirements.memoryRequirements.size; *resultSizeInBytes = m_resultSizeInBytes; *scratchSizeInBytes = m_scratchSizeInBytes; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## Generate The BLAS builder `Generate` takes as input the scratch and storage buffers, whose size have been computed above. Note that these buffers must be on the device (`VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT`), and with the `VK_BUFFER_USAGE_RAY_TRACING_BIT_NV` state before calling `Generate`. Note that the `Generate` call is the only one actually performing any GPU work, hence it requires a command buffer. In case the BLAS is dynamic, once the BLAS has been built once it is possible to set the `updateOnly` parameter and, in this case, also provide a pointer to the current BLAS. The update can be done in-place or not, so it is possible to have `previousResult==resultBuffer`. Whether the BLAS is dynamic or not has been indicated in the `CreateAccelerationStructure` method. This allows us to partially check the consistency between the `CreateAccelerationStructure` and `Generate` calls. The first step of the method binds the acceleration structure memory to the acceleration structure handle using `vkBindAccelerationStructureMemoryNV`, in a way similar to the binding of memory to a buffer. The builder work is described in the arguments to `vkCmdBuildAccelerationStructureNV`, which in particular provide the set of geometries to add and the target buffers. In case the BLAS is used directly within the same command buffer, the helper contains a barrier to ensure the build is finished before processing further commands. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Enqueue the construction of the acceleration structure on a command list, using // application-provided buffers and possibly a pointer to the previous acceleration structure in // case of iterative updates. Note that the update can be done in place: the result and // previousResult pointers can be the same. void BottomLevelASGenerator::Generate( VkDevice device, VkCommandBuffer commandList, // Command list on which the build will be enqueued VkAccelerationStructureNV accelerationStructure, VkBuffer scratchBuffer, // Scratch buffer used by the builder to // store temporary data VkDeviceSize scratchOffset, // Offset in the scratch buffer at which the builder can start writing memory VkBuffer resultBuffer, // Result buffer storing the acceleration structure VkDeviceMemory resultMem, VkBool32 updateOnly, // If true, simply refit the existing // acceleration structure VkAccelerationStructureNV previousResult // Optional previous acceleration // structure, used if an iterative update // is requested ) { // Sanity checks if(m_flags != VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_UPDATE_BIT_NV && updateOnly) { throw std::logic_error("Cannot update a bottom-level AS not originally built for updates"); } if(updateOnly && previousResult == VK_NULL_HANDLE) { throw std::logic_error("Bottom-level hierarchy update requires the previous hierarchy"); } if(m_resultSizeInBytes == 0 || m_scratchSizeInBytes == 0) { throw std::logic_error( "Invalid scratch and result buffer sizes - ComputeASBufferSizes needs " "to be called before Build"); } // Bind the acceleration structure descriptor to the actual memory that will contain it VkBindAccelerationStructureMemoryInfoNV bindInfo; bindInfo.sType = VK_STRUCTURE_TYPE_BIND_ACCELERATION_STRUCTURE_MEMORY_INFO_NV; bindInfo.pNext = nullptr; bindInfo.accelerationStructure = accelerationStructure; bindInfo.memory = resultMem; bindInfo.memoryOffset = 0; bindInfo.deviceIndexCount = 0; bindInfo.pDeviceIndices = nullptr; VkResult code = vkBindAccelerationStructureMemoryNV(device, 1, &bindInfo); if(code != VK_SUCCESS) { throw std::logic_error("vkBindAccelerationStructureMemoryNV failed"); } // Build the actual bottom-level acceleration structure VkAccelerationStructureInfoNV buildInfo; buildInfo.sType = VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_INFO_NV; buildInfo.pNext = nullptr; buildInfo.flags = m_flags; buildInfo.type = VK_ACCELERATION_STRUCTURE_TYPE_BOTTOM_LEVEL_NV; buildInfo.geometryCount = static_cast(m_vertexBuffers.size()); buildInfo.pGeometries = m_vertexBuffers.data(); buildInfo.instanceCount = 0; vkCmdBuildAccelerationStructureNV(commandList, &buildInfo, VK_NULL_HANDLE, 0, updateOnly, accelerationStructure, updateOnly ? previousResult : VK_NULL_HANDLE, scratchBuffer, scratchOffset); // Wait for the builder to complete by setting a barrier on the resulting buffer. This is // particularly important as the construction of the top-level hierarchy may be called right // afterwards, before executing the command list. VkMemoryBarrier memoryBarrier; memoryBarrier.sType = VK_STRUCTURE_TYPE_MEMORY_BARRIER; memoryBarrier.pNext = nullptr; memoryBarrier.srcAccessMask = VK_ACCESS_ACCELERATION_STRUCTURE_WRITE_BIT_NV | VK_ACCESS_ACCELERATION_STRUCTURE_READ_BIT_NV; memoryBarrier.dstAccessMask = VK_ACCESS_ACCELERATION_STRUCTURE_WRITE_BIT_NV | VK_ACCESS_ACCELERATION_STRUCTURE_READ_BIT_NV; vkCmdPipelineBarrier(commandList, VK_PIPELINE_STAGE_ACCELERATION_STRUCTURE_BUILD_BIT_NV, VK_PIPELINE_STAGE_ACCELERATION_STRUCTURE_BUILD_BIT_NV, 0, 1, &memoryBarrier, 0, nullptr, 0, nullptr); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Top-Level Acceleration Structure The `TopLevelAS` class embeds the code required to compute the top-level acceleration structure (TLAS), which binds together a set of BLAS described in the section above. The top-level hierarchy is used to store a set of instances represented by bottom-level hierarchies in a way suitable for fast intersection at runtime. To be built, this data structure requires some scratch space which has to be allocated by the application. Similarly, the resulting data structure is stored in an application-controlled buffer. To be used, the application must first add all the instances to be contained in the final structure, using AddInstance. After all instances have been added, ComputeASBufferSizes will prepare the build, and provide the required sizes for the scratch data and the final result. The Generate call will finally compute the acceleration structure and store it in the result buffer. Note that the build is enqueued in the command list, meaning that the scratch buffer needs to be kept until the command list execution is finished. Here is an example usage from the tutorial: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Add all instances of the scene TopLevelAS topLevelAS; topLevelAS.AddInstance(instances1, matrix1, instanceId1, hitGroupIndex1); topLevelAS.AddInstance(instances2, matrix2, instanceId2, hitGroupIndex2); ... structure = topLevelAS.CreateAccelerationStructure(device, VK_TRUE); // Find the size of the buffers to store the AS VkDeviceSize scratchSize, resultSize, instanceDescsSize; topLevelAS.ComputeASBufferSizes(device, structure, &scratchSize, &resultSize, &instanceDescsSize); // Create the AS buffers AccelerationStructureBuffers buffers; buffers.pScratch = nv_helpers_vk::CreateBuffer(..., scratchSizeInBytes, ...); buffers.pResult = nv_helpers_vk::CreateBuffer(..., resultSizeInBytes, ...); buffers.pInstanceDesc = nv_helpers_vk::CreateBuffer(..., resultSizeInBytes, ...); // Generate the top level acceleration structure m_topLevelASGenerator.Generate( device, commandBuffer, structure, buffers.scratchBuffer, buffers.resultBuffer, buffers.resultMem, buffers.instancesBuffer, buffers.instancesMem, updateOnly, updateOnly ? structure : VK_NULL_HANDLE); return buffers; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The `TopLevelAS` class contains an internal structure to store the description of the instances, namely a pointer to the corresponding BLAS, a transform matrix, the instance index accessible as `gl_InstanceID` in the GLSL code, and the hit group index defining the first hit group of the Shader Binding Table corresponding to that particular instance. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /// Helper struct storing the instance data struct Instance { Instance(VkAccelerationStructureNV blAS, const glm::mat4x4 &tr, uint32_t iID, uint32_t hgId); /// Bottom-level AS VkAccelerationStructureNV bottomLevelAS; /// Transform matrix const glm::mat4x4 transform; /// Instance ID visible in the shader uint32_t instanceID; /// Hit group index used to fetch the shaders from the SBT uint32_t hitGroupIndex; }; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The instances are stored in a vector for later use: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /// Instances contained in the top-level AS std::vector m_instances; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ When passing instance data to the `VK_NV_ray_tracing` extension, instances need to follow a precise 64-byte data layout. While somewhat redundant with the above instance definition, we made the choice of keeping both so that glm matrices are kept intact, simplifying operations on the application side. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /// Geometry instance, with the layout expected by VK_NV_ray_tracing struct VkGeometryInstance { /// Transform matrix, containing only the top 3 rows float transform[12]; /// Instance index uint32_t instanceId : 24; /// Visibility mask uint32_t mask : 8; /// Index of the hit group which will be invoked when a ray hits the instance uint32_t instanceOffset : 24; /// Instance flags, such as culling uint32_t flags : 8; /// Opaque handle of the bottom-level acceleration structure uint64_t accelerationStructureHandle; }; static_assert(sizeof(VkGeometryInstance) == 64, "VkGeometryInstance structure compiles to incorrect size"); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The class also contains a flag indicating whether the TLAS supports dynamic updates or not, and the amounts of memory required by the builder: the scratch memory to store temporary data during the build only, the size of the buffer containing the description of the instances, and the final buffer containing the TLAS itself. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /// Construction flags, indicating whether the AS supports iterative updates VkBuildAccelerationStructureFlagsNV m_flags; /// Size of the temporary memory used by the TLAS builder VkDeviceSize m_scratchSizeInBytes; /// Size of the buffer containing the instance descriptors VkDeviceSize m_instanceDescsSizeInBytes; /// Size of the buffer containing the TLAS VkDeviceSize m_resultSizeInBytes; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## AddInstance This method adds an instance to the top-level acceleration structure. The instance is represented by a bottom-level AS, a transform, an instance ID and the index of the hit group indicating which shaders are executed upon hitting any geometry within the instance. It simply enqueues the instance data in the vector of instances. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Add an instance to the top-level acceleration structure. The instance is // represented by a bottom-level AS, a transform, an instance ID and the index // of the hit group indicating which shaders are executed upon hitting any // geometry within the instance void TopLevelASGenerator::AddInstance( VkAccelerationStructureNV bottomLevelAS, // Bottom-level acceleration structure containing the // actual geometric data of the instance const glm::mat4x4& transform, // Transform matrix to apply to the instance, allowing the // same bottom-level AS to be used at several world-space // positions uint32_t instanceID, // Instance ID, which can be used in the shaders to // identify this specific instance uint32_t hitGroupIndex // Hit group index, corresponding the the index of the // hit group in the Shader Binding Table that will be // invocated upon hitting the geometry ) { m_instances.emplace_back(Instance(bottomLevelAS, transform, instanceID, hitGroupIndex)); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## CreateAccelerationStructure Once all instances have been added, we need to create the handle to the acceleration structure that will hold the information. The handle creation requires a flag indicating whether the acceleration structure will support dynamic updates, so that the builder can optimize the structure accordingly. Note that while the bottom-level AS indicates a information on the geometries, the top-level AS provides the instance information. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Create the opaque acceleration structure descriptor, which will be used in the estimation of // the AS size and the generation itself. The allowUpdate flag indicates if the AS will need // dynamic refitting. This has to be called after adding all the instances. VkAccelerationStructureNV TopLevelASGenerator::CreateAccelerationStructure( VkDevice device, VkBool32 allowUpdate /* = VK_FALSE */) { // The generated AS can support iterative updates. This may change the final // size of the AS as well as the temporary memory requirements, and hence has // to be set before the actual build m_flags = allowUpdate ? VK_BUILD_ACCELERATION_STRUCTURE_ALLOW_UPDATE_BIT_NV : 0; VkAccelerationStructureInfoNV info{VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_INFO_NV}; info.type = VK_ACCELERATION_STRUCTURE_TYPE_TOP_LEVEL_NV; info.flags = m_flags; info.instanceCount = static_cast( m_instances.size()); // The descriptor already contains the number of instances info.geometryCount = 0; // Since this is a top-level AS, it does not contain any geometry info.pGeometries = VK_NULL_HANDLE; VkAccelerationStructureCreateInfoNV accelerationStructureInfo{ VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_CREATE_INFO_NV}; accelerationStructureInfo.sType = VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_CREATE_INFO_NV; accelerationStructureInfo.info = info; accelerationStructureInfo.pNext = nullptr; accelerationStructureInfo.compactedSize = 0; VkAccelerationStructureNV accelerationStructure; VkResult code = vkCreateAccelerationStructureNV(device, &accelerationStructureInfo, nullptr, &accelerationStructure); if(code != VK_SUCCESS) { throw std::logic_error("vkCreateAccelerationStructureNV failed"); } return accelerationStructure; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## ComputeASBufferSizes Once all instances have been added to the vector of instance descriptors, we need to estimate 3 amounts of memory required to build the TLAS: the size of the scratch space, which is used as temporary storage during the build, the size of the buffer holding the instance descriptors, and the size of the actual TLAS. This method returns both values, so that the application can allocate the appropriate amounts of memory. The description of the work to be performed by the builder is provided in the `VkAccelerationStructureNV` structure. It provides the number of instances, and a flag indicating whether the AS will be static or possibly updated over time. This flag is stored in the helper for later use during the build. This information is then passed to `vkGetAccelerationStructureMemoryRequirementsNV`, which provides the required amounts of scratch and storage memory. The size of the instance descriptor buffer is simply given by the number of instances and the size of the `VkGeometryInstance` structure. The required sizes are returned so that the application can allocate the buffers before calling the TLAS builder. See the `Generate` section for the requirements on the buffers themselves. Note that the ray tracing API makes a distinction between scratch buffer requirements for build vs. update operations. For simplicity, our helper implementation return a single scratch buffer size, which is the greater of build and update scratch buffer sizes. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Compute the size of the scratch space required to build the acceleration // structure, as well as the size of the resulting structure. The allocation of // the buffers is then left to the application void TopLevelASGenerator::ComputeASBufferSizes( VkDevice device, /* Device on which the build will be performed */ VkAccelerationStructureNV accelerationStructure, VkDeviceSize* scratchSizeInBytes, /* Required scratch memory on the GPU to build the acceleration structure */ VkDeviceSize* resultSizeInBytes, /* Required GPU memory to store the acceleration structure */ VkDeviceSize* instancesSizeInBytes /* Required GPU memory to store instance */ /* descriptors, containing the matrices, indices etc. */ ) { // Create a descriptor indicating which memory requirements we want to obtain VkAccelerationStructureMemoryRequirementsInfoNV memoryRequirementsInfo; memoryRequirementsInfo.sType = VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_MEMORY_REQUIREMENTS_INFO_NV; memoryRequirementsInfo.pNext = nullptr; memoryRequirementsInfo.type = VK_ACCELERATION_STRUCTURE_MEMORY_REQUIREMENTS_TYPE_OBJECT_NV; memoryRequirementsInfo.accelerationStructure = accelerationStructure; // Query the memory requirements. Note that the number of instances in the AS has already // been provided when creating the AS descriptor VkMemoryRequirements2 memoryRequirements; vkGetAccelerationStructureMemoryRequirementsNV(device, &memoryRequirementsInfo, &memoryRequirements); // Size of the resulting acceleration structure m_resultSizeInBytes = memoryRequirements.memoryRequirements.size; // Store the memory requirements for use during build memoryRequirementsInfo.type = VK_ACCELERATION_STRUCTURE_MEMORY_REQUIREMENTS_TYPE_BUILD_SCRATCH_NV; vkGetAccelerationStructureMemoryRequirementsNV(device, &memoryRequirementsInfo, &memoryRequirements); m_scratchSizeInBytes = memoryRequirements.memoryRequirements.size; memoryRequirementsInfo.type = VK_ACCELERATION_STRUCTURE_MEMORY_REQUIREMENTS_TYPE_UPDATE_SCRATCH_NV; vkGetAccelerationStructureMemoryRequirementsNV(device, &memoryRequirementsInfo, &memoryRequirements); m_scratchSizeInBytes = m_scratchSizeInBytes > memoryRequirements.memoryRequirements.size ? m_scratchSizeInBytes : memoryRequirements.memoryRequirements.size; *resultSizeInBytes = m_resultSizeInBytes; *scratchSizeInBytes = m_scratchSizeInBytes; // Amount of memory required to store the instance descriptors m_instanceDescsSizeInBytes = m_instances.size() * sizeof(VkGeometryInstance); *instancesSizeInBytes = m_instanceDescsSizeInBytes; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## Generate The TLAS builder `Generate` takes as input the scratch, instance descriptor and storage buffers, whose size have been computed above. The scratch and storage buffers must be stored on the device (`VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT`), and with the `VK_BUFFER_USAGE_RAY_TRACING_BIT_NV` state before calling `Generate`. The instance descriptor buffer must be in the zero-copy memory (`VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT`) as it will be mapped within the `Generate` method. The `Generate` call is the only one actually performing any GPU work, hence it requires a command buffer. In case the TLAS is dynamic, once the TLAS has been built once it is possible to set the `updateOnly` parameter and, in this case, also provide a pointer to the current TLAS. The update can be done in-place or not, so it is possible to have `previousResult==resultBuffer`. Whether the TLAS is dynamic or not has been indicated in the `ComputeASBufferSizes` method. This allows us to partially check the consistency between the `ComputeASBufferSizes` and `Generate` calls. `Generate` first sets up the instance data, and maps the instance descriptor buffer to copy the instance data into it. It then binds the acceleration structure memory to the acceleration structure handle using `vkBindAccelerationStructureMemoryNV`, in a way similar to the binding of physical memory to buffers. The actual builder work is described in the arguments of `vkCmdBuildAccelerationStructureNV`, which in particular provide the set of instances to add and the scratch, instances and result buffers. In case the TLAS is used directly within the same command buffer, the helper contains a barrier to ensure the build is finished before processing further commands. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Enqueue the construction of the acceleration structure on a command list, // using application-provided buffers and possibly a pointer to the previous // acceleration structure in case of iterative updates. Note that the update can // be done in place: the result and previousResult descriptors can be the same. void TopLevelASGenerator::Generate( VkDevice device, // Device on which the generation will be performed VkCommandBuffer commandBuffer, // Command list on which the build will be enqueued VkAccelerationStructureNV accelerationStructure, VkBuffer scratchBuffer, // Scratch buffer used by the builder to // store temporary data VkDeviceSize scratchOffset, // Offset in the scratch buffer at which the builder can start writing memory VkBuffer resultBuffer, // Result buffer storing the acceleration structure VkDeviceMemory resultMem, VkBuffer instancesBuffer, // Auxiliary result buffer containing the instance // descriptors, has to be in upload heap VkDeviceMemory instancesMem, VkBool32 updateOnly /*= false*/, // If true, simply refit the // existing acceleration structure VkAccelerationStructureNV previousResult /*= nullptr*/ // Optional previous acceleration // structure, used if an iterative // update is requested ) { // For each instance, build the corresponding instance descriptor std::vector geometryInstances; for(const auto& inst : m_instances) { uint64_t accelerationStructureHandle = 0; VkResult code = vkGetAccelerationStructureHandleNV(device, inst.bottomLevelAS, sizeof(uint64_t), &accelerationStructureHandle); if(code != VK_SUCCESS) { throw std::logic_error("vkGetAccelerationStructureHandleNV failed"); } VkGeometryInstance gInst; glm::mat4x4 transp = glm::transpose(inst.transform); memcpy(gInst.transform, &transp, sizeof(gInst.transform)); gInst.instanceId = inst.instanceID; // The visibility mask is always set of 0xFF, but if some instances would need to be ignored in // some cases, this flag should be passed by the application gInst.mask = 0xff; // Set the hit group index, that will be used to find the shader code to execute when hitting // the geometry gInst.instanceOffset = inst.hitGroupIndex; // Disable culling - more fine control could be provided by the application gInst.flags = VK_GEOMETRY_INSTANCE_TRIANGLE_CULL_DISABLE_BIT_NV; gInst.accelerationStructureHandle = accelerationStructureHandle; geometryInstances.push_back(gInst); } // Copy the instance descriptors into the provided mappable buffer VkDeviceSize instancesBufferSize = geometryInstances.size() * sizeof(VkGeometryInstance); void* data; vkMapMemory(device, instancesMem, 0, instancesBufferSize, 0, &data); memcpy(data, geometryInstances.data(), instancesBufferSize); vkUnmapMemory(device, instancesMem); // Bind the acceleration structure descriptor to the actual memory that will store the AS itself VkBindAccelerationStructureMemoryInfoNV bindInfo; bindInfo.sType = VK_STRUCTURE_TYPE_BIND_ACCELERATION_STRUCTURE_MEMORY_INFO_NV; bindInfo.pNext = nullptr; bindInfo.accelerationStructure = accelerationStructure; bindInfo.memory = resultMem; bindInfo.memoryOffset = 0; bindInfo.deviceIndexCount = 0; bindInfo.pDeviceIndices = nullptr; VkResult code = vkBindAccelerationStructureMemoryNV(device, 1, &bindInfo); if(code != VK_SUCCESS) { throw std::logic_error("vkBindAccelerationStructureMemoryNV failed"); } // Build the acceleration structure and store it in the result memory VkAccelerationStructureInfoNV buildInfo; buildInfo.sType = VK_STRUCTURE_TYPE_ACCELERATION_STRUCTURE_INFO_NV; buildInfo.pNext = nullptr; buildInfo.flags = m_flags; buildInfo.type = VK_ACCELERATION_STRUCTURE_TYPE_TOP_LEVEL_NV; buildInfo.instanceCount = static_cast(geometryInstances.size()); buildInfo.geometryCount = 0; buildInfo.pGeometries = nullptr; vkCmdBuildAccelerationStructureNV(commandBuffer, &buildInfo, instancesBuffer, 0, updateOnly, accelerationStructure, updateOnly ? previousResult : VK_NULL_HANDLE, scratchBuffer, scratchOffset); // Ensure that the build will be finished before using the AS using a barrier VkMemoryBarrier memoryBarrier; memoryBarrier.sType = VK_STRUCTURE_TYPE_MEMORY_BARRIER; memoryBarrier.pNext = nullptr; memoryBarrier.srcAccessMask = VK_ACCESS_ACCELERATION_STRUCTURE_WRITE_BIT_NV | VK_ACCESS_ACCELERATION_STRUCTURE_READ_BIT_NV; memoryBarrier.dstAccessMask = VK_ACCESS_ACCELERATION_STRUCTURE_WRITE_BIT_NV | VK_ACCESS_ACCELERATION_STRUCTURE_READ_BIT_NV; vkCmdPipelineBarrier(commandBuffer, VK_PIPELINE_STAGE_ACCELERATION_STRUCTURE_BUILD_BIT_NV, VK_PIPELINE_STAGE_ACCELERATION_STRUCTURE_BUILD_BIT_NV, 0, 1, &memoryBarrier, 0, nullptr, 0, nullptr); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Descriptor Set Generator The `DescriptorSetGenerator` class is not directly related to Vulkan ray tracing, but applies to Vulkan in general to simplify writing descriptor pools, layouts and sets by allowing the user to iteratively add components. In the context of the `VK_NV_ray_tracing` extension the order in which the addition methods are called is important as it will directly map to the Shader Binding Table entries to which buffer pointers will be bound. Example to create an empty descriptor pool, layout and set: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ nv_helpers_vk::DescriptorSetGenerator dsg; VkDescriptorPool pool = dsg.GeneratePool(device); VkDescriptorSetLayout layout = dsg.GenerateLayout(device); VkDescriptorSet set = dsg.GenerateSet(device, pool, layout); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Example to create a descriptor set with one uniform buffer, bound to index 0, and accessible from a ray generation shadfer: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ nv_helpers_vk::DescriptorSetGenerator dsg; dsg.AddBinding(0, 1, VK_DESCRIPTOR_TYPE_UNIFORM_BUFFER, VK_SHADER_STAGE_RAYGEN_BIT_NV) VkDescriptorPool pool = dsg.GeneratePool(device); VkDescriptorSetLayout layout = dsg.GenerateLayout(device); VkDescriptorSet set = dsg.GenerateSet(device, pool, layout); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## WriteInfo The core of the binding of the `DescriptorSetGenerator` class is done in the `WriteInfo` subclass, which will be able to store a number of `VkWriteDescriptorSet` for a given type of descriptor. Depending on the type of the descriptor information, that information needs to be written to a different field of the `VkWriteDescriptorSet` structure: for example, the `VkDescriptorBufferInfo` of a buffer needs to be written in the `pBufferInfo` field of the `VkWriteDescriptorSet`, while a `VkDescriptorImageInfo` gets written to `pImageInfo`. The second template parameter of the class is then the offset of the target structure member within `VkWriteDescriptorSet`. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /// Store the information to write into one descriptor set entry: the number of descriptors of the /// entry, and where in the descriptor the buffer information should be written template struct WriteInfo ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This class contains two members: a vector of `VkWriteDescriptorSet`, and another vector containing the actual data to write into each `VkWriteDescriptorSet`. Since each `VkWriteDescriptorSet` can refer to an arbitrary number of descriptors, each entry of the vector is itself a vector of descriptor information structures. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /// Write descriptors std::vector writeDesc; /// Contents to write in one of the info members of the descriptor std::vector> contents; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Since the bindings are added iteratively, and the `VkWriteDescriptorSet` structure requires pointers to information structures, those pointers cannot be set on-the-fly when binding information. Instead, the pointer setup is delayed until all bindings are defined. As explained above the offset parameter of the class is used to point to the right member of the target `VkWriteDescriptorSet`. Note that the use of `reinterpret_cast` prevents any type checking, and hence the offsets need to be provided accurately to avoid undefined behavior. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /// Since the VkWriteDescriptorSet structure requires pointers to the info descriptors, and we /// use std::vector to store those, the pointers can be set only when we are finished adding /// data in the vectors. The SetPointers then writes the info descriptor at the proper offset in /// the VkWriteDescriptorSet structure void SetPointers() { for(size_t i = 0; i < writeDesc.size(); i++) { T** dest = reinterpret_cast(reinterpret_cast(&writeDesc[i]) + offset); *dest = contents[i].data(); } } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The binding of information to a descriptor set is done using the `Bind` method, which first creates a `VkWriteDescriptorSet`. It then checks whether that binding already exists, in which case it simply replaces the binding. Otherwise, it adds the `VkWriteDescriptorSet` and information to the class members. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /// Bind a vector of info descriptors to a slot in the descriptor set void Bind(VkDescriptorSet set, /// Target descriptor set uint32_t binding, /// Slot in the descriptor set the infos will be bound to VkDescriptorType type, /// Type of the descriptor const std::vector& info /// Descriptor infos to bind ) { // Initialize the descriptor write, keeping all the resource pointers to NULL since they will // be set by SetPointers once all resources have been bound VkWriteDescriptorSet descriptorWrite = {}; descriptorWrite.sType = VK_STRUCTURE_TYPE_WRITE_DESCRIPTOR_SET; descriptorWrite.dstSet = set; descriptorWrite.dstBinding = binding; descriptorWrite.dstArrayElement = 0; descriptorWrite.descriptorType = type; descriptorWrite.descriptorCount = static_cast(info.size()); descriptorWrite.pBufferInfo = VK_NULL_HANDLE; descriptorWrite.pImageInfo = VK_NULL_HANDLE; descriptorWrite.pTexelBufferView = VK_NULL_HANDLE; descriptorWrite.pNext = VK_NULL_HANDLE; // If the binding point had already been used in a Bind call, replace the binding info // Linear search, not so great - hopefully not too many binding points for(size_t i = 0; i < writeDesc.size(); i++) { if(writeDesc[i].dstBinding == binding) { writeDesc[i] = descriptorWrite; contents[i] = info; return; } } // Add the write descriptor and resource info for later actual binding writeDesc.push_back(descriptorWrite); contents.push_back(info); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## Class members The `DescriptorSetGenerator` class contains an association of the binding points with their layout binding information, and a number of `WriteInfo` instanciations for a few descriptor information structures. Note that this is not an exhaustive list, and further `WriteInfo` should be added, for example for texture management. The `offsetof` macro provides the offsets for the structure members. Since the ray tracing is an extension, the `VkWriteDescriptorSetAccelerationStructureNV` structure corresponds to the `pNext` member of `VkWriteDescriptorSet`. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /// Association of the binding slot index with the binding information std::unordered_map m_bindings; /// Buffer binding requests. Buffer descriptor infos are written into the pBufferInfo member of /// the VkWriteDescriptorSet structure WriteInfo m_buffers; /// Image binding requests. Image descriptor infos are written into the pImageInfo member of /// the VkWriteDescriptorSet structure WriteInfo m_images; /// Acceleration structure binding requests. Since this is using an non-core extension, AS /// descriptor infos are written into the pNext member of the VkWriteDescriptorSet structure WriteInfo m_accelerationStructures; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## Binding Declaring a binding is done in the `AddBinding` method, which simply records the association of a binding point with a `VkDescriptorSetLayoutBinding`: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Add a binding to the descriptor set void DescriptorSetGenerator::AddBinding( uint32_t binding, // Slot to which the descriptor will be bound, corresponding to the layout // index in the shader uint32_t descriptorCount, // Number of descriptors to bind VkDescriptorType type, // Type of the bound descriptor(s) VkShaderStageFlags stage, // Shader stage at which the bound resources will be available VkSampler* sampler // Corresponding sampler, in case of textures ) { VkDescriptorSetLayoutBinding b = {}; b.binding = binding; b.descriptorCount = descriptorCount; b.descriptorType = type; b.pImmutableSamplers = sampler; b.stageFlags = stage; // Sanity check to avoid binding different resources to the same binding point if(m_bindings.find(binding) != m_bindings.end()) { throw std::logic_error("Binding collision"); } m_bindings[binding] = b; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Once all the bindings are declared, and a descriptor set created from those bindings, it is possible to bind specific resources using the `WriteInfo` class: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ //-------------------------------------------------------------------------------------------------- // Bind an buffer void DescriptorSetGenerator::Bind(VkDescriptorSet set, uint32_t binding, const std::vector& bufferInfo) { m_buffers.Bind(set, binding, m_bindings[binding].descriptorType, bufferInfo); } //-------------------------------------------------------------------------------------------------- // Bind an image void DescriptorSetGenerator::Bind(VkDescriptorSet set, uint32_t binding, const std::vector& imageInfo) { m_images.Bind(set, binding, m_bindings[binding].descriptorType, imageInfo); } //-------------------------------------------------------------------------------------------------- // Bind an acceleration structure void DescriptorSetGenerator::Bind( VkDescriptorSet set, uint32_t binding, const std::vector& accelInfo) { m_accelerationStructures.Bind(set, binding, m_bindings[binding].descriptorType, accelInfo); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Once all the bindings are set, the actual contents of the descriptor set need to be written. We first update the `VkWriteDescriptorSet` structures for each information, using the `WriteInfo` subclass. For each type of information structure, we then call `vkUpdateDescriptorSets`. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Actually write the binding info into the descriptor set void DescriptorSetGenerator::UpdateSetContents(VkDevice device, VkDescriptorSet set) { // For each resource type, set the actual pointers in the VkWriteDescriptorSet structures, and // write the resulting structures into the descriptor set m_buffers.SetPointers(); vkUpdateDescriptorSets(device, static_cast(m_buffers.writeDesc.size()), m_buffers.writeDesc.data(), 0, nullptr); m_images.SetPointers(); vkUpdateDescriptorSets(device, static_cast(m_images.writeDesc.size()), m_images.writeDesc.data(), 0, nullptr); m_accelerationStructures.SetPointers(); vkUpdateDescriptorSets(device, static_cast(m_accelerationStructures.writeDesc.size()), m_accelerationStructures.writeDesc.data(), 0, nullptr); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## Generation The descriptor pool will be the allocator for the descriptor sets. It contains the number of bindings for each type, and also needs to be provided with the maximum number of descriptor sets that can be allocated from the pool. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Once the bindings have been added, this generates the descriptor pool with enough space to // handle all the bound resources and allocate up to maxSets descriptor sets VkDescriptorPool DescriptorSetGenerator::GeneratePool(VkDevice device, uint32_t maxSets /* = 1 */) { VkDescriptorPool pool; // Aggregate the bindings to obtain the required size of the descriptors using that layout std::vector counters; counters.reserve(m_bindings.size()); for(const auto& b : m_bindings) { counters.push_back({b.second.descriptorType, b.second.descriptorCount}); } // Create the pool information descriptor, that contains the number of descriptors of each type VkDescriptorPoolCreateInfo poolInfo = {}; poolInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_POOL_CREATE_INFO; poolInfo.poolSizeCount = static_cast(counters.size()); poolInfo.pPoolSizes = counters.data(); poolInfo.maxSets = maxSets; // Create the actual descriptor pool if(vkCreateDescriptorPool(device, &poolInfo, nullptr, &pool) != VK_SUCCESS) { throw std::runtime_error("failed to create descriptor pool!"); } return pool; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The `GenerateLayout` will return a `VkDescriptorSetLayout` representing the bindings, in the order they were added when calling `AddBinding`: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Once the bindings have been added, this generates the descriptor layout corresponding to the // bound resources VkDescriptorSetLayout DescriptorSetGenerator::GenerateLayout(VkDevice device) { VkDescriptorSetLayout layout; // Build the vector of bindings // For production, this copy should be avoided std::vector bindings; bindings.reserve(m_bindings.size()); for(const auto& b : m_bindings) { bindings.push_back(b.second); } // Create the layout from the vector of bindings VkDescriptorSetLayoutCreateInfo layoutInfo = {}; layoutInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_LAYOUT_CREATE_INFO; layoutInfo.bindingCount = static_cast(bindings.size()); layoutInfo.pBindings = bindings.data(); if(vkCreateDescriptorSetLayout(device, &layoutInfo, nullptr, &layout) != VK_SUCCESS) { throw std::runtime_error("failed to create descriptor set layout!"); } return layout; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Finally, the descriptor set is generated in `GenerateSet` which, from a descriptor pool and descriptor set layout, will call `vkAllocateDescriptorSets`: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Generate a descriptor set from the pool and layout VkDescriptorSet DescriptorSetGenerator::GenerateSet(VkDevice device, VkDescriptorPool pool, VkDescriptorSetLayout layout) { VkDescriptorSet set; VkDescriptorSetLayout layouts[] = {layout}; VkDescriptorSetAllocateInfo allocInfo = {}; allocInfo.sType = VK_STRUCTURE_TYPE_DESCRIPTOR_SET_ALLOCATE_INFO; allocInfo.descriptorPool = pool; allocInfo.descriptorSetCount = 1; allocInfo.pSetLayouts = layouts; if(vkAllocateDescriptorSets(device, &allocInfo, &set) != VK_SUCCESS) { throw std::runtime_error("failed to allocate descriptor set!"); } return set; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Ray Tracing Pipeline The ray tracing pipeline combines the ray tracing shaders into a state object, that can be thought of as an executable GPU program. For that, it requires the shaders compiled as `VkShaderModule`, each of them associated to a `VkPipelineShaderStageCreateInfo` representing the shader stage at which the module will be used. Simple usage of this class: we create the ray generation and miss stages from precompiled SPIR-V modules, then start creating a hit group. A hit group can contain an intersection shader, an any-hit shader and a closest-hit shader. Since the intersection and any-hit shaders are optional, we only add the closest hit and close the hit group. The recursion depth indicates how many `traceNV()` calls can be nested, ie. how many hit shaders can be recursively called. For example, tracing only primary rays from the camera corresponds to a depth of 1. If shadow rays are traced from the hits, then the depth will be 2. In general, it is best to keep that depth as low as possible. Finally, we compile the pipeline using `Generate` into what can be thought as an executable program representing the ray tracing process. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ nv_helpers_vk::RayTracingPipelineGenerator pipelineGen; // We use only one ray generation, that will implement the camera model m_rayGenIndex = pipelineGen.AddRayGenShaderStage(VkCtx.createShaderModule(readFile("shaders/raygen.spv"))); // The first miss shader is used to look-up the environment in case the rays from the camera miss // the geometry m_missIndex = pipelineGen.AddMissShaderStage(VkCtx.createShaderModule(readFile("shaders/miss.spv"))); // The first hit group defines the shaders invoked when a ray shot from the camera hit the // geometry. In this case we only specify the closest hit shader, and rely on the build-in // triangle intersection and pass-through any-hit shader. However, explicit intersection and // any hit shaders could be added as well. m_hitGroupIndex = pipelineGen.StartHitGroup(); pipelineGen.AddHitShaderStage(VkCtx.createShaderModule(readFile("shaders/closesthit.spv")), VK_SHADER_STAGE_CLOSEST_HIT_BIT_NV); pipelineGen.EndHitGroup(); pipelineGen.SetMaxRecursionDepth(1); // Generate the pipeline pipelineGen.Generate(VkCtx.getDevice(), m_rtDescriptorSetLayout, &m_rtPipeline, &m_rtPipelineLayout); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In the ray tracing pipeline, shaders are aggregated into groups. There are 3 types of shader groups, which may contain different shader types: - VK_RAY_TRACING_SHADER_GROUP_TYPE_GENERAL_NV shader groups are used to specify non-hit shaders. They must contain only one shader reference, which may be a ray generation shader, a miss shader or a callable shader. - VK_RAY_TRACING_SHADER_GROUP_TYPE_TRIANGLES_HIT_GROUP_NV shader groups are used to specify hit shaders that are used with the built-in triangle intersection shader. They may contain references to both a closest hit shader as well as an any hit shader. - VK_RAY_TRACING_SHADER_GROUP_TYPE_PROCEDURAL_HIT_GROUP_NV shader groups are used to specify hit shaders which will be used together with a custom intersection shader. The group must contain a reference to an intersection shader and one or both of closest hit and any hit shaders. The input to the ray tracing pipeline build process is a list of shader modules together with their hit group information. Shaders are referenced in each group by their position in the list of shaders in the pipeline --- each shader reference is an index into this list. The shader groups themselves are later used to build the SBT. The application queries the API for an opaque handle to each shader group in the pipeline, which can then be written to the SBT. The `RayTracingPipelineGenerator` class handles the details of creating and managing shader groups. It contains the list of shader stages, as well as the list of shader groups. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /// Shader stages contained in the pipeline std::vector m_shaderStages; /// Each shader stage belongs to a group. There are 3 group types: general, /// triangle hit and procedural hit. /// The general group type (VK_RAY_TRACING_SHADER_GROUP_TYPE_GENERAL_NV) is used for raygen, /// miss and callable shaders. /// The triangle hit group type (VK_RAY_TRACING_SHADER_GROUP_TYPE_TRIANGLES_HIT_GROUP_NV) /// is used for closest hit and any hit shaders, when used together with the built-in /// ray-triangle intersection shader. /// The procedural hit group type (VK_RAY_TRACING_SHADER_GROUP_TYPE_PROCEDURAL_HIT_GROUP_NV) /// is used for custom intersection shaders, and also groups closest hit and any hit shaders /// that are used together with that intersection shader. std::vector m_shaderGroups; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Since the helper works by adding shader stages in an immediate mode style, it needs to store the current group number and the fact that a hit group is currently open. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /// Index of the current hit group uint32_t m_currentGroupIndex = 0; /// True if a group description is currently started bool m_isHitGroupOpen = false; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Finally, it also stores the maximum recursion depth: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /// Maximum recursion depth, initialized to 1 to at least allow tracing primary rays uint32_t m_maxRecursionDepth = 1; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Adding a ray generation shader is simply done by passing the corresponding module to `AddRayGenShaderStage`, which creates a `VkPipelineShaderStageCreateInfo`, adds it to the list of shader stages, then creates a `VkRayTracingShaderGroupCreateInfoNV` and adds it to the list of shader groups, and finally returns the index of the created group containing the ray generation shader: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Add a ray generation shader stage, and return the index of the created stage uint32_t RayTracingPipelineGenerator::AddRayGenShaderStage(VkShaderModule module) { if(m_isHitGroupOpen) { throw std::logic_error("Cannot add raygen stage in when hit group open"); } VkPipelineShaderStageCreateInfo stageCreate; stageCreate.sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO; stageCreate.pNext = nullptr; stageCreate.stage = VK_SHADER_STAGE_RAYGEN_BIT_NV; stageCreate.module = module; // This member has to be 'main', regardless of the actual entry point of the shader stageCreate.pName = "main"; stageCreate.flags = 0; stageCreate.pSpecializationInfo = nullptr; m_shaderStages.emplace_back(stageCreate); uint32_t shaderIndex = static_cast(m_shaderStages.size() - 1); VkRayTracingShaderGroupCreateInfoNV groupInfo; groupInfo.sType = VK_STRUCTURE_TYPE_RAY_TRACING_SHADER_GROUP_CREATE_INFO_NV; groupInfo.pNext = nullptr; groupInfo.type = VK_RAY_TRACING_SHADER_GROUP_TYPE_GENERAL_NV; groupInfo.generalShader = shaderIndex; groupInfo.closestHitShader = VK_SHADER_UNUSED_NV; groupInfo.anyHitShader = VK_SHADER_UNUSED_NV; groupInfo.intersectionShader = VK_SHADER_UNUSED_NV; m_shaderGroups.emplace_back(groupInfo); return m_currentGroupIndex++; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Adding a miss shader works exactly like to the ray generation: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Add a miss shader stage, and return the index of the created stage uint32_t RayTracingPipelineGenerator::AddMissShaderStage(VkShaderModule module) { if(m_isHitGroupOpen) { throw std::logic_error("Cannot add miss stage in when hit group open"); } VkPipelineShaderStageCreateInfo stageCreate; stageCreate.sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO; stageCreate.pNext = nullptr; stageCreate.stage = VK_SHADER_STAGE_MISS_BIT_NV; stageCreate.module = module; // This member has to be 'main', regardless of the actual entry point of the shader stageCreate.pName = "main"; stageCreate.flags = 0; stageCreate.pSpecializationInfo = nullptr; m_shaderStages.emplace_back(stageCreate); uint32_t shaderIndex = static_cast(m_shaderStages.size() - 1); VkRayTracingShaderGroupCreateInfoNV groupInfo; groupInfo.sType = VK_STRUCTURE_TYPE_RAY_TRACING_SHADER_GROUP_CREATE_INFO_NV; groupInfo.pNext = nullptr; groupInfo.type = VK_RAY_TRACING_SHADER_GROUP_TYPE_GENERAL_NV; groupInfo.generalShader = shaderIndex; groupInfo.closestHitShader = VK_SHADER_UNUSED_NV; groupInfo.anyHitShader = VK_SHADER_UNUSED_NV; groupInfo.intersectionShader = VK_SHADER_UNUSED_NV; m_shaderGroups.emplace_back(groupInfo); return m_currentGroupIndex++; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Since a hit group can contain several shaders (intersection, any-hit, closest-hit), we use a simple state to start and finish the definition of the group. The addition of the shaders themselves is similar to the ray generation and miss shaders: the only difference is that all shaders within a group share the same group number. We also track the type of hit group, and update it if an intersection shader is added. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Start the description of a hit group, that contains at least a closest hit shader, but may // also contain an intesection shader and a any-hit shader. The method outputs the index of the // created hit group uint32_t RayTracingPipelineGenerator::StartHitGroup() { if(m_isHitGroupOpen) { throw std::logic_error("Hit group already open"); } VkRayTracingShaderGroupCreateInfoNV groupInfo; groupInfo.sType = VK_STRUCTURE_TYPE_RAY_TRACING_SHADER_GROUP_CREATE_INFO_NV; groupInfo.pNext = nullptr; groupInfo.type = VK_RAY_TRACING_SHADER_GROUP_TYPE_TRIANGLES_HIT_GROUP_NV; groupInfo.generalShader = VK_SHADER_UNUSED_NV; groupInfo.closestHitShader = VK_SHADER_UNUSED_NV; groupInfo.anyHitShader = VK_SHADER_UNUSED_NV; groupInfo.intersectionShader = VK_SHADER_UNUSED_NV; m_shaderGroups.push_back(groupInfo); m_isHitGroupOpen = true; return m_currentGroupIndex; } //-------------------------------------------------------------------------------------------------- // Add a hit shader stage in the current hit group, where the stage can be // VK_SHADER_STAGE_ANY_HIT_BIT_NV, VK_SHADER_STAGE_CLOSEST_HIT_BIT_NV, or // VK_SHADER_STAGE_INTERSECTION_BIT_NV uint32_t RayTracingPipelineGenerator::AddHitShaderStage(VkShaderModule module, VkShaderStageFlagBits shaderStage) { if(!m_isHitGroupOpen) { throw std::logic_error("Cannot add hit stage in when no hit group open"); } auto& group = m_shaderGroups[m_currentGroupIndex]; switch(shaderStage) { case VK_SHADER_STAGE_ANY_HIT_BIT_NV: if(group.anyHitShader != VK_SHADER_UNUSED_NV) { throw std::logic_error("Any hit shader already specified for current hit group"); } break; case VK_SHADER_STAGE_CLOSEST_HIT_BIT_NV: if(group.closestHitShader != VK_SHADER_UNUSED_NV) { throw std::logic_error("Closest hit shader already specified for current hit group"); } break; case VK_SHADER_STAGE_INTERSECTION_BIT_NV: if(group.intersectionShader != VK_SHADER_UNUSED_NV) { throw std::logic_error("Intersection shader already specified for current hit group"); } break; default: throw std::logic_error("Invalid hit shader type"); } VkPipelineShaderStageCreateInfo stageCreate; stageCreate.sType = VK_STRUCTURE_TYPE_PIPELINE_SHADER_STAGE_CREATE_INFO; stageCreate.pNext = nullptr; stageCreate.stage = shaderStage; stageCreate.module = module; // This member has to be 'main', regardless of the actual entry point of the shader stageCreate.pName = "main"; stageCreate.flags = 0; stageCreate.pSpecializationInfo = nullptr; m_shaderStages.emplace_back(stageCreate); uint32_t shaderIndex = static_cast(m_shaderStages.size() - 1); switch(shaderStage) { case VK_SHADER_STAGE_ANY_HIT_BIT_NV: group.anyHitShader = shaderIndex; break; case VK_SHADER_STAGE_CLOSEST_HIT_BIT_NV: group.closestHitShader = shaderIndex; break; case VK_SHADER_STAGE_INTERSECTION_BIT_NV: group.type = VK_RAY_TRACING_SHADER_GROUP_TYPE_PROCEDURAL_HIT_GROUP_NV; group.intersectionShader = shaderIndex; break; } return m_currentGroupIndex; } //-------------------------------------------------------------------------------------------------- // End the description of the hit group void RayTracingPipelineGenerator::EndHitGroup() { if(!m_isHitGroupOpen) { throw std::logic_error("No hit group open"); } m_isHitGroupOpen = false; m_currentGroupIndex++; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The recursion depth setting is straightforward: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Upon hitting a surface, a closest hit shader can issue a new TraceRay call. This parameter // indicates the maximum level of recursion. Note that this depth should be kept as low as // possible, typically 2, to allow hit shaders to trace shadow rays. Recursive ray tracing // algorithms must be flattened to a loop in the ray generation program for best performance. void RayTracingPipelineGenerator::SetMaxRecursionDepth(uint32_t maxDepth) { m_maxRecursionDepth = maxDepth; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ All shaders in the pipeline will reference resources within the same descriptor set, hence we only have one call to `vkCreatePipelineLayout` to define how the shaders will access their resources. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Compiles the raytracing state object void RayTracingPipelineGenerator::Generate(VkDevice device, VkDescriptorSetLayout descriptorSetLayout, VkPipeline* pipeline, VkPipelineLayout* layout) { // Create the layout of the pipeline following the provided descriptor set layout VkPipelineLayoutCreateInfo pipelineLayoutCreateInfo; pipelineLayoutCreateInfo.sType = VK_STRUCTURE_TYPE_PIPELINE_LAYOUT_CREATE_INFO; pipelineLayoutCreateInfo.pNext = nullptr; pipelineLayoutCreateInfo.flags = 0; pipelineLayoutCreateInfo.setLayoutCount = 1; pipelineLayoutCreateInfo.pSetLayouts = &descriptorSetLayout; pipelineLayoutCreateInfo.pushConstantRangeCount = 0; pipelineLayoutCreateInfo.pPushConstantRanges = nullptr; VkResult code = vkCreatePipelineLayout(device, &pipelineLayoutCreateInfo, nullptr, layout); if(code != VK_SUCCESS) { throw std::logic_error("rt vkCreatePipelineLayout failed"); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Using this layout we define the pipeline stages, group numbers and recursion levels within a `VkRaytracingPipelineCreateInfoNV` structure, which is then passed to `vkCreateRaytracingPipelinesNV` to compile the pipeline itself. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Assemble the shader stages and recursion depth info into the raytracing pipeline VkRayTracingPipelineCreateInfoNV rayPipelineInfo; rayPipelineInfo.sType = VK_STRUCTURE_TYPE_RAY_TRACING_PIPELINE_CREATE_INFO_NV; rayPipelineInfo.pNext = nullptr; rayPipelineInfo.flags = 0; rayPipelineInfo.stageCount = static_cast(m_shaderStages.size()); rayPipelineInfo.pStages = m_shaderStages.data(); rayPipelineInfo.groupCount = static_cast(m_shaderGroups.size()); rayPipelineInfo.pGroups = m_shaderGroups.data(); rayPipelineInfo.maxRecursionDepth = m_maxRecursionDepth; rayPipelineInfo.layout = *layout; rayPipelineInfo.basePipelineHandle = VK_NULL_HANDLE; rayPipelineInfo.basePipelineIndex = 0; code = vkCreateRayTracingPipelinesNV(device, nullptr, 1, &rayPipelineInfo, nullptr, pipeline); if(code != VK_SUCCESS) { throw std::logic_error("rt vkCreateRayTracingPipelinesNV failed"); } } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ # Shader Binding Table The Shader Binding Table (SBT) is the cornerstone of the Vulkan ray tracing setup: it associates the contents of the acceleration structures to the shaders and their resources. The `ShaderBindingTable` class is a helper to construct the SBT. It helps maintaining the proper offsets of each element, required when constructing the SBT, but also when calling `vkCmdTraceRaysNV`. Each record in the SBT consists of a shader or hit group identifier, followed by a set of 32-bit values representing offsets in the descriptor set attached to the pipeline. In a simple example, the ray generation shader, miss shader and hit group are added in the table without any accompanying values since they access members of the descriptor set directly. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Add the entry point, the ray generation program m_sbtGen.AddRayGenerationProgram(m_rayGenIndex, {}); // Add the miss shader for the camera rays m_sbtGen.AddMissProgram(m_missIndex, {}); // For each instance, we will have 1 hit group for the camera rays. // When setting the instances in the top-level acceleration structure we indicated the index // of the hit group in the shader binding table that will be invoked. // Add the hit group defining the behavior upon hitting a surface with a camera ray m_sbtGen.AddHitGroup(m_hitGroupIndex, {}); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Once the entries in the SBT have been defined, the size of the required SBT buffer on the GPU is computed by a call to `ComputeSBTSize`. In a way similar to the acceleration structure setup, this allows the application to know how much memory will be required to store the SBT on the GPU, and allocate the buffer as needed. Note that the helper will map the SBT buffer, and hence this buffer needs to be created in zero-copy memory (`VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT`). ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Compute the required size for the SBT VkDeviceSize shaderBindingTableSize = m_sbtGen.ComputeSBTSize(m_raytracingProperties); // Allocate mappable memory to store the SBT nv_helpers_vk::createBuffer(VkCtx.getPhysicalDevice(), VkCtx.getDevice(), shaderBindingTableSize, VK_BUFFER_USAGE_TRANSFER_SRC_BIT, &m_shaderBindingTableBuffer, &m_shaderBindingTableMem, VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Using the application-allocated buffer, the SBT is then generated by calling the `Generate` method: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Generate the SBT using mapping. For further performance a staging buffer should be used, so // that the SBT is guaranteed to reside on GPU memory without overheads. m_sbtGen.Generate(VkCtx.getDevice(), m_rtPipeline, m_shaderBindingTableBuffer, m_shaderBindingTableMem); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The contents of the SBT are used during the ray tracing process, and for that the `vkCmdTraceRaysNV` call needs to obtain the appropriate pointers and offsets to address the right shaders and resources. This consistency is enforced by using the helper when calling `vkCmdTraceRaysNV` upon rendering. The helper introduces a number of `Get*` methods for each shader category (ray generation, miss, hit group) to access the size of a SBT entries for that shader category, and the the size of the SBT section for that category. Arbitrarily, the helper puts first the ray generation, followed by the miss shaders, then the hit groups. That is why the `rayGenOffset` of the ray generation section is at the beginning of the SBT buffer, while the address of the first miss is offset by the size of a ray generation entry. Similarly, we offset the address of the first hit group. Note that we use the same table to store all the entries required for tracing, but `vkCmdTraceRaysNV` would allow storing each category of shaders in a different buffer if needed. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ VkDeviceSize rayGenOffset = helloVulkan.m_sbtGen.GetRayGenOffset(); VkDeviceSize missOffset = helloVulkan.m_sbtGen.GetMissOffset(); VkDeviceSize missStride = helloVulkan.m_sbtGen.GetMissEntrySize(); VkDeviceSize hitGroupOffset = helloVulkan.m_sbtGen.GetHitGroupOffset(); VkDeviceSize hitGroupStride = helloVulkan.m_sbtGen.GetHitGroupEntrySize(); vkCmdTraceRaysNV(cmdBuff, helloVulkan.m_shaderBindingTableBuffer, rayGenOffset, helloVulkan.m_shaderBindingTableBuffer, missOffset, missStride, helloVulkan.m_shaderBindingTableBuffer, hitGroupOffset, hitGroupStride, VK_NULL_HANDLE, 0, 0, helloVulkan.m_framebufferSize.width, helloVulkan.m_framebufferSize.height, 1); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## Private class members A `SBTEntry` structure stores the name of the shader, and a vector containing the set of values representing its resources (either offsets or 32-bit constants): ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /// Wrapper for SBT entries, each consisting of the name of the program and a list of values, /// which can be either offsets or raw 32-bit constants struct SBTEntry { SBTEntry(uint32_t groupIndex, std::vector inlineData); uint32_t m_groupIndex; const std::vector m_inlineData; }; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The SBT helper maintains a list of shaders in each category: ray generation, miss and hit group ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ /// Ray generation shader entries std::vector m_rayGen; /// Miss shader entries std::vector m_miss; /// Hit group entries std::vector m_hitGroup; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ For each category, the size of an entry in the SBT depends on the maximum number of resources used by the shaders in that category. The helper computes those values automatically in `GetEntrySize`. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ uint32_t m_rayGenEntrySize; uint32_t m_missEntrySize; uint32_t m_hitGroupEntrySize; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The program names are translated into program identifiers. The size in bytes of an identifier is provided by the device and is the same for all categories. The final size of the SBT is also stored after calling `ComputeSBTSize`. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ UINT m_progIdSize; VkDeviceSize m_sbtSize; ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## AddRayGenerationProgram This method adds a ray generation program by its group index, and appends the list of parameters associated to it (either offsets or constants) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Add a ray generation program by group index, with its list of offsets or values void ShaderBindingTableGenerator::AddRayGenerationProgram( uint32_t groupIndex, const std::vector& inlineData) { m_rayGen.emplace_back(SBTEntry(groupIndex, inlineData)); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## AddMissProgram Adds a miss program by its group index, and appends the list of parameters associated to it (either offsets or constants) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Add a miss program by group index, with its list of offsets or values void ShaderBindingTableGenerator::AddMissProgram(uint32_t groupIndex, const std::vector& inlineData) { m_miss.emplace_back(SBTEntry(groupIndex, inlineData)); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## AddHitGroup Adds a hit group by its group index, and appends the list of parameters associated to it (either offsets or constants) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Add a hit group by group index, with its list of offsets or values void ShaderBindingTableGenerator::AddHitGroup(uint32_t groupIndex, const std::vector& inlineData) { m_hitGroup.emplace_back(SBTEntry(groupIndex, inlineData)); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## ComputeSBTSize The size of the Shader Binding Table depends on the set of programs and hit groups it contains, and on how many resources are required for each category of shader programs. We first query the size of a program identifier, which is dependent on the driver implementation. Then, for each shader category (ray generation, miss, hit group) we use the private `GetEntrySize()` method to compute the amount of memory required for an entry of each category. The size of the SBT is then given by the number of programs in each category and their SBT entry sizes. After calling `ComputeSBTSize` the application only has to allocate the SBT buffer in zero-copy memory. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Compute the size of the SBT based on the set of programs and hit groups it contains VkDeviceSize ShaderBindingTableGenerator::ComputeSBTSize( const VkPhysicalDeviceRayTracingPropertiesNV& props) { // Size of a program identifier m_progIdSize = props.shaderGroupHandleSize; // Compute the entry size of each program type depending on the maximum number of parameters in // each category m_rayGenEntrySize = GetEntrySize(m_rayGen); m_missEntrySize = GetEntrySize(m_miss); m_hitGroupEntrySize = GetEntrySize(m_hitGroup); // The total SBT size is the sum of the entries for ray generation, miss and hit groups m_sbtSize = m_rayGenEntrySize * static_cast(m_rayGen.size()) + m_missEntrySize * static_cast(m_miss.size()) + m_hitGroupEntrySize * static_cast(m_hitGroup.size()); return m_sbtSize; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## GetEntrySize This private method is invoked by `ComputeSBTSize`, and computes the size of the SBT entries for a set of entries, which is determined by finding the entry having the the maximum number of parameters of its root signature. A SBT entry then contains the program identifier, plus 4 bytes for each parameter. The entries need to be aligned on 16 bytes. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Compute the size of the SBT entries for a set of entries, which is determined by their maximum // number of parameters VkDeviceSize ShaderBindingTableGenerator::GetEntrySize(const std::vector& entries) { // Find the maximum number of parameters used by a single entry size_t maxArgs = 0; for(const auto& shader : entries) { maxArgs = std::max(maxArgs, shader.m_inlineData.size()); } // A SBT entry is made of a program ID and a set of 4-byte parameters (offsets or push constants) VkDeviceSize entrySize = m_progIdSize + static_cast(maxArgs); // The entries of the shader binding table must be 16-bytes-aligned entrySize = ROUND_UP(entrySize, 16); return entrySize; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## Generate Once the SBT size has been computed and the application has allocated the SBT buffer on the upload heap, the `Generate` method builds the actual contents of the SBT. We first map the SBT buffer to allow writing to it, hence the need of having the buffer on the upload heap. Then, for each shader category, we copy the shader identifiers and resources using the private method `CopyShaderData`. This method returns the number of bytes written in the SBT to store this category of shader. We call this method first for the ray generation, then for the miss shaders, and finally for the hit groups, before unmapping the buffer. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Build the SBT and store it into sbtBuffer, which has to be pre-allocated in zero-copy memory. // Access to the raytracing pipeline object is required to fetch program identifiers void ShaderBindingTableGenerator::Generate(VkDevice device, VkPipeline raytracingPipeline, VkBuffer sbtBuffer, VkDeviceMemory sbtMem) { uint32_t groupCount = static_cast(m_rayGen.size()) + static_cast(m_miss.size()) + static_cast(m_hitGroup.size()); // Fetch all the shader handles used in the pipeline, so that they can be written in the SBT // Note that this could be also done by fetching the handles one by one when writing the SBT entries auto shaderHandleStorage = new uint8_t[groupCount * m_progIdSize]; VkResult code = vkGetRayTracingShaderGroupHandlesNV(device, raytracingPipeline, 0, groupCount, m_progIdSize * groupCount, shaderHandleStorage); // Map the SBT void* vData; code = vkMapMemory(device, sbtMem, 0, m_sbtSize, 0, &vData); if(code != VK_SUCCESS) { throw std::logic_error("SBT vkMapMemory failed"); } auto* data = static_cast(vData); // Copy the shader identifiers followed by their resource pointers or root constants: first the // ray generation, then the miss shaders, and finally the set of hit groups VkDeviceSize offset = 0; offset = CopyShaderData(device, raytracingPipeline, data, m_rayGen, m_rayGenEntrySize, shaderHandleStorage); data += offset; offset = CopyShaderData(device, raytracingPipeline, data, m_miss, m_missEntrySize, shaderHandleStorage); data += offset; offset = CopyShaderData(device, raytracingPipeline, data, m_hitGroup, m_hitGroupEntrySize, shaderHandleStorage); // Unmap the SBT vkUnmapMemory(device, sbtMem); delete shaderHandleStorage; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## CopyShaderData For each entry, this private method copies the shader identifier followed by its offsets and/or root constants in `outputData`, with a stride in bytes of `entrySize`, and returns the size in bytes actually written to `outputData`. We iterate through the list of entries, and check whether that symbol is actually defined in the ray tracing pipeline. We then copy the shader identifier and its array of resources to the SBT. At the end we return the number of bytes written, which is given by the number of entries times the size of an entry. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // For each entry, copy the shader identifier followed by its resource pointers and/or root // constants in outputData, with a stride in bytes of entrySize, and returns the size in bytes // actually written to outputData. VkDeviceSize ShaderBindingTableGenerator::CopyShaderData(VkDevice device, VkPipeline pipeline, uint8_t* outputData, const std::vector& shaders, VkDeviceSize entrySize, const uint8_t* shaderHandleStorage) { uint8_t* pData = outputData; for(const auto& shader : shaders) { // Copy the shader identifier that was previously obtained with // vkGetRayTracingShaderGroupHandlesNV memcpy(pData, shaderHandleStorage + shader.m_groupIndex * m_progIdSize, m_progIdSize); // Copy all its resources pointers or values in bulk if(!shader.m_inlineData.empty()) { memcpy(pData + m_progIdSize, shader.m_inlineData.data(), shader.m_inlineData.size()); } pData += entrySize; } // Return the number of bytes actually written to the output buffer return static_cast(shaders.size()) * entrySize; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## Reset This method simply resets all the parameters of the helper ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Reset the sets of programs and hit groups void ShaderBindingTableGenerator::Reset() { m_rayGen.clear(); m_miss.clear(); m_hitGroup.clear(); m_rayGenEntrySize = 0; m_missEntrySize = 0; m_hitGroupEntrySize = 0; m_progIdSize = 0; } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ## Getters The following getters are used to simplify the call to DispatchRays where the offsets of the shader programs must be exactly following the SBT layout. Their implementation is straightforward, by accessing the precomputed entry sizes and the number of entries in each category. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ // Get the size in bytes of the SBT section dedicated to ray generation programs VkDeviceSize ShaderBindingTableGenerator::GetRayGenSectionSize() const { return m_rayGenEntrySize * static_cast(m_rayGen.size()); } // Get the size in bytes of one ray generation program entry in the SBT VkDeviceSize ShaderBindingTableGenerator::GetRayGenEntrySize() const { return m_rayGenEntrySize; } VkDeviceSize ShaderBindingTableGenerator::GetRayGenOffset() const { return 0; } // Get the size in bytes of the SBT section dedicated to miss programs VkDeviceSize ShaderBindingTableGenerator::GetMissSectionSize() const { return m_missEntrySize * static_cast(m_miss.size()); } // Get the size in bytes of one miss program entry in the SBT VkDeviceSize ShaderBindingTableGenerator::GetMissEntrySize() { return m_missEntrySize; } VkDeviceSize ShaderBindingTableGenerator::GetMissOffset() const { // Miss is right after raygen return GetRayGenSectionSize(); } // Get the size in bytes of the SBT section dedicated to hit groups VkDeviceSize ShaderBindingTableGenerator::GetHitGroupSectionSize() const { return m_hitGroupEntrySize * static_cast(m_hitGroup.size()); } // Get the size in bytes of one hit group entry in the SBT VkDeviceSize ShaderBindingTableGenerator::GetHitGroupEntrySize() const { return m_hitGroupEntrySize; } VkDeviceSize ShaderBindingTableGenerator::GetHitGroupOffset() const { // hit groups are after raygen and miss return GetRayGenSectionSize() + GetMissSectionSize(); } ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~