This example describes the API of the code generated by the "PTX" backend for compiled materials and shows how a renderer can call this generated code to evaluate sub-expressions of multiple materials using CUDA.
New Topics
- MDL material state (PTX)
- Execution of generated code (PTX)
- Loading textures (PTX)
- Texture access functions (PTX)
Detailed Description
- MDL material state (PTX)
The MDL material state structure mi::neuraylib::Shading_state_material is a representation of the renderer state as defined in section 19 "Renderer state" in the MDL specification. It is used to make the state of the renderer (like the position of an intersection point on the surface, the shading normal and the texture coordinates) available to the generated code.
Here's a version of the material state structure making use of the types from CUDA's vector_types.h:
float3 normal;
float3 geom_normal;
float3 position;
float animation_time;
const float3 *text_coords;
const float3 *tangent_u;
const float3 *tangent_v;
float4 *text_results;
const char *ro_data_segment;
const float4 *world_to_object;
const float4 *object_to_world;
int object_id;
};
Please refer to the structure documentation for more information.
In this example, we fill the material state structure with some example values and only use one texture space. For the world-to-object and object-to-world transformation matrices we use identity matrices. We will iterate the position
and text_coords
fields over a 2x2 quad around the center of the world with position x
and y
coordinates ranging from -1 to 1 and the texture uv-coordinates ranging from 0 to 1, respectively.
- Execution of generated code (PTX)
For the non-native backends, the generated code has to be called directly from the corresponding framework, so we need to know the prototypes of the functions generated via mi::neuraylib::IMdl_backend::translate_material_expression(). With "NAME"
being the function name you provided as fname
parameter and "T"
being the result type, they look like this:
void NAME(
T *result,
Resource_data const *res_data,
void const *exception_state,
char const *captured_args);
or written as a PTX prototype:
.visible .func NAME(
.param .b64 result,
.param .b64 state,
.param .b64 res_data,
.param .b64 exception_state,
.param .b64 captured_args
);
The res_data
parameter is used to provide access to resources like textures depending on the way how those resources are accessed (see "Texture access functions" below). If it is not used, the pointers inside the structure may be NULL:
struct Resource_data {
const void *shared_data;
const Texture_handler_base *texture_handler;
};
The exception_state
parameter allows to provide handlers for out-of-bounds array access exceptions and division-by-zero exceptions. But for the PTX backend, this is not supported and the parameter should be set to NULL.
The captured_args
parameter is used to provide the data of the mi::neuraylib::ITarget_argument_block object for class-compiled materials. The data can either be manually created using the information from mi::neuraylib::ITarget_value_layout or by using mi::neuraylib::ITarget_code::get_argument_block() or mi::neuraylib::ITarget_code::create_argument_block(). For instance-compiled materials, this parameter should be set to NULL. See Instance-compilation and class-compilation for more details about instance and class compilation.
To make the generated functions available to our CUDA kernel, we have to link them with the kernel. We could just declare the generated functions with the corresponding names as extern "C"
in the CUDA source code of the kernel, but we may want to decide at runtime how many materials will be available. So we will add an indirection through an array of pointers to the generated functions which we provide as an additional PTX source code buffer to the CUDA linker.
- Note
- We currently have to add a dummy function to the PTX code containing this function pointer array, because the CUDA linker will otherwise just resolve the function addresses to zero.
-
Also, we need to compile the CUDA kernel with the option
-rdc=true
(relocatable device code), otherwise the extern
declared function array will be treated as a definition resulting in two arrays.
In this example, we bake multiple materials into a texture with a user-configurable checkerboard pattern by executing a material for every texel updating the material state accordingly. At the end, we write the texture to disk.
- Loading textures (PTX)
When the nv_freeimage
plugin has been loaded via mi::neuraylib::IMdl_compiler::load_plugin_library() before starting the MDL SDK, the SDK will automatically load textures on the host side for many common image formats and make them available via mi::neuraylib::ITarget_code::get_texture(). Note, that the first texture is always the invalid texture, so only if there is more than just one texture according to mi::neuraylib::ITarget_code::get_texture_count(), there will be real referenced textures available.
Here's a small code snippet showing how to access the mi::neuraylib::ICanvas of the texture at index i
.
The textures still have to be copied to the GPU and possibly they have to be gamma corrected and converted to a format understood by the texture access functions you provide. In this example, we use the mi::neuraylib::IImage_api to apply the gamma correction and to convert the image format to a float32 RGBA format.
Depending on the texture shape returned by mi::neuraylib::ITarget_code::get_texture_shape() the texture image data has to be copied to a CUDA array (2D textures), a 3D array (3D textures) or a 3D array with the cudaArrayCubemap
flag set (cube textures). The textures could then be made available via CUDA texture objects.
- Note
- For cube textures, you should use the
cudaAddressModeClamp
address mode for the texture objects to avoid visual artifacts in the corners.
- Texture access functions (PTX)
For non-native backends, the generated code requires a set of methods implementing texture access functionality:
- tex_lookup_float4_2d
- tex_lookup_float3_2d
- tex_texel_float4_2d
- tex_lookup_float4_3d
- tex_lookup_float3_3d
- tex_texel_float4_3d
- tex_lookup_float4_cube
- tex_lookup_float3_cube
- tex_resolution_2d
Except for the last one, these correspond directly to the functions described in section 20.3 "Standard library functions - Texture" in the MDL specification.
The tex_lookup_*
functions receive floating-point texture coordinates and should provide a sampled value, whereas tex_texel_*
functions receive integer texture coordinates and should provide a raw texture value. There are variants for texture lookups with and without alpha channel (float4 / float3) and for the different texture shapes (2d / 3d / cube) as described in section 6.12 "Variables and data types - Textures" in the MDL specification. Note, that PTEX textures are currently not supported by the backends.
tex_resolution_2d
retrieves the width and height at the given uv-tile coordinates for a texture_2d. For non-uv-tile textures, the uv-tile coordinates are always (0, 0).
For the PTX backend, there are different ways how the methods can be provided, which can be chosen by setting the "tex_lookup_call_mode"
option via the mi::neuraylib::IMdl_backend::set_option() method to the corresponding mode. Please refer to Texture lookup call modes of the PTX backend for more details.
In this example, you can switch from the "direct_call"
mode to the slower "vtable"
mode by commenting out this line in example_cuda_shared.h:
check_success(be_cuda_ptx->set_option("tex_lookup_call_mode", "direct_call") == 0);
Example Source
To compile the source code, you need to install the "CUDA Toolkit 9" available at https://developer.nvidia.com/cuda-downloads.
For Windows, you should install the toolkit with enabled Visual Studio integration to be able to use the provided project files.
For Linux and Mac OS X, you have to provide the path to the CUDA Toolkit installation via a "CUDA_PATH"
environment variable or by setting the path in the Makefile
.
Source Code Location: examples/mdl_sdk/execution_cuda/example_execution_cuda.cpp
#include <iostream>
#include <vector>
#include "example_cuda_shared.h"
struct Options {
unsigned material_pattern;
unsigned res_x, res_y;
bool use_class_compilation;
bool no_aa;
bool enable_derivatives;
Options()
: outputfile()
, material_pattern(0)
, res_x(700)
, res_y(520)
, use_class_compilation(false)
, no_aa(false)
, enable_derivatives(false)
{
}
};
Options &options,
{
CUfunction cuda_function;
char const *ptx_name = options.enable_derivatives ?
"example_execution_cuda_derivatives.ptx" : "example_execution_cuda.ptx";
CUmodule cuda_module = build_linked_kernel(
target_codes,
(get_executable_folder() + ptx_name).c_str(),
"evaluate_mat_expr",
&cuda_function);
Material_gpu_context material_gpu_context(options.enable_derivatives);
for (
size_t i = 0, num_target_codes = target_codes.
size(); i < num_target_codes; ++i) {
if (!material_gpu_context.prepare_target_code_data(
transaction, image_api, target_codes[i].get()))
return nullptr;
}
CUdeviceptr device_tc_data_list = material_gpu_context.get_device_target_code_data_list();
CUdeviceptr device_arg_block_list =
material_gpu_context.get_device_target_argument_block_list();
CUdeviceptr device_outbuf;
check_cuda_success(cuMemAlloc(&device_outbuf, options.res_x * options.res_y * sizeof(float3)));
dim3 threads_per_block(16, 16);
dim3 num_blocks((options.res_x + 15) / 16, (options.res_y + 15) / 16);
void *kernel_params[] = {
&device_outbuf,
&device_tc_data_list,
&device_arg_block_list,
&options.res_x,
&options.res_y,
&num_samples
};
check_cuda_success(cuLaunchKernel(
cuda_function,
num_blocks.x, num_blocks.y, num_blocks.z,
threads_per_block.x, threads_per_block.y, threads_per_block.z,
0, nullptr, kernel_params, nullptr));
image_api->
create_canvas(
"Rgb_fp", options.res_x, options.res_y));
float3 *data = static_cast<float3 *>(tile->get_data());
check_cuda_success(cuMemcpyDtoH(
data, device_outbuf, options.res_x * options.res_y * sizeof(float3)));
check_cuda_success(cuMemFree(device_outbuf));
check_cuda_success(cuModuleUnload(cuda_module));
}
void usage(char const *prog_name)
{
<< "Usage: " << prog_name << " [options] [(<material_pattern | (<material_name1> ...)]\n"
<< "Options:\n"
<< " --res <x> <y> resolution (default: 700x520)\n"
<< " --cc use class compilation\n"
<< " --noaa disable pixel oversampling\n"
<< " -d enable use of derivatives\n"
<< " -o <outputfile> image file to write result to\n"
<< " (default: example_cuda_<material_pattern>.png)\n"
<< " --mdl_path <path> mdl search path, can occur multiple times.\n"
<< " <material_pattern> a number from 1 to 2 ^ num_materials - 1 choosing which\n"
<< " material combination to use (default: 2 ^ num_materials - 1)\n"
<< " <material_name*> qualified name of materials to use. The example will try to\n"
<< " access the path \"surface.scattering.tint\"."
keep_console_open();
}
int main(int argc, char* argv[])
{
Options options;
options.mdl_paths.push_back(get_samples_mdl_root());
for (int i = 1; i < argc; ++i) {
char const *opt = argv[i];
if (opt[0] == '-') {
if (
strcmp(opt,
"-o") == 0 && i < argc - 1) {
options.outputfile = argv[++i];
}
else if (
strcmp(opt,
"--res") == 0 && i < argc - 2) {
}
else if (
strcmp(opt,
"--cc") == 0) {
options.use_class_compilation = true;
}
else if (
strcmp(opt,
"--noaa") == 0) {
options.no_aa = true;
}
else if (
strcmp(opt,
"-d") == 0) {
options.enable_derivatives = true;
}
else if (
strcmp(opt,
"--mdl_path") == 0 && i < argc - 1) {
options.mdl_paths.push_back(argv[++i]);
} else {
usage(argv[0]);
}
} else if (opt[0] >= '0' && opt[0] <= '9') {
options.material_pattern = unsigned(
atoi(opt));
} else
}
if (options.material_names.empty()) {
options.material_names.push_back("::nvidia::sdk_examples::tutorials::example_execution1");
options.material_names.push_back("::nvidia::sdk_examples::tutorials::example_execution2");
options.material_names.push_back("::nvidia::sdk_examples::tutorials::example_execution3");
}
if (options.material_pattern == 0)
options.material_pattern = (1 << options.material_names.size()) - 1;
else if (options.material_pattern < 1 ||
options.material_pattern > unsigned(1 << options.material_names.size()) - 1) {
usage(argv[0]);
}
if (options.outputfile.empty())
options.outputfile =
"example_cuda_" +
to_string(options.material_pattern) +
".png";
check_success(neuray.is_valid_interface());
for (
std::size_t i = 0; i < options.mdl_paths.size(); ++i)
check_success(mdl_compiler->
add_module_path(options.mdl_paths[i].c_str()) == 0);
check_start_success(result);
{
{
Material_compiler mc(
mdl_compiler.get(),
mdl_factory.get(),
transaction.get(),
0,
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
false,
#endif
options.enable_derivatives);
for (unsigned i = 0, n = unsigned(options.material_names.size()); i < n; ++i) {
if ((options.material_pattern & (1 << i)) != 0) {
mc.add_material_subexpr(
options.material_names[i],
"surface.scattering.tint", "tint",
options.use_class_compilation);
}
}
target_codes.
push_back(mc.generate_cuda_ptx());
CUcontext cuda_context = init_cuda();
bake_expression_cuda_ptx(
transaction.get(),
image_api.get(),
target_codes,
options,
options.no_aa ? 1 : 8));
uninit_cuda(cuda_context);
if (canvas)
}
}
mdl_compiler = 0;
check_success(neuray->shutdown() == 0);
neuray = 0;
check_success(unload());
keep_console_open();
return EXIT_SUCCESS;
}
Source Code Location: examples/mdl_sdk/shared/texture_support_cuda.h
#ifndef TEXTURE_SUPPORT_CUDA_H
#define TEXTURE_SUPPORT_CUDA_H
#include <cuda.h>
#include <cuda_runtime.h>
#define USE_SMOOTHERSTEP_FILTER
#ifndef M_PI
#define M_PI 3.14159265358979323846
#endif
#define M_ONE_OVER_PI 0.318309886183790671538
struct Texture
{
cudaTextureObject_t filtered_object;
cudaTextureObject_t unfiltered_object;
uint3 size;
float3 inv_size;
};
struct Mbsdf
{
unsigned has_data[2];
cudaTextureObject_t eval_data[2];
float max_albedo[2];
float* sample_data[2];
float* albedo_data[2];
uint2 angular_resolution[2];
float2 inv_angular_resolution[2];
unsigned num_channels[2];
};
struct Lightprofile
{
explicit Lightprofile()
: angular_resolution(make_uint2(0, 0))
, theta_phi_start(make_float2(0.0f, 0.0f))
, theta_phi_delta(make_float2(0.0f, 0.0f))
, theta_phi_inv_delta(make_float2(0.0f, 0.0f))
, candela_multiplier(0.0f)
, total_power(0.0f)
, eval_data(0)
{
}
uint2 angular_resolution;
float2 theta_phi_start;
float2 theta_phi_delta;
float2 theta_phi_inv_delta;
float candela_multiplier;
float total_power;
cudaTextureObject_t eval_data;
float* cdf_data;
};
struct Texture_handler : Texture_handler_base {
size_t num_textures;
Texture const *textures;
size_t num_mbsdfs;
Mbsdf const *mbsdfs;
size_t num_lightprofiles;
Lightprofile const *lightprofiles;
};
size_t num_textures;
Texture const *textures;
size_t num_mbsdfs;
Mbsdf const *mbsdfs;
size_t num_lightprofiles;
Lightprofile const *lightprofiles;
};
__device__ inline void store_result4(float res[4], const float4 &v)
{
res[0] = v.x;
res[1] = v.y;
res[2] = v.z;
res[3] = v.w;
}
__device__ inline void store_result4(float res[4], float s)
{
res[0] = res[1] = res[2] = res[3] = s;
}
__device__ inline void store_result4(
float res[4], float v0, float v1, float v2, float v3)
{
res[0] = v0;
res[1] = v1;
res[2] = v2;
res[3] = v3;
}
__device__ inline void store_result3(float res[3], float3 const&v)
{
res[0] = v.x;
res[1] = v.y;
res[2] = v.z;
}
__device__ inline void store_result3(float res[3], const float4 &v)
{
res[0] = v.x;
res[1] = v.y;
res[2] = v.z;
}
__device__ inline void store_result3(float res[3], float s)
{
res[0] = res[1] = res[2] = s;
}
__device__ inline void store_result3(float res[3], float v0, float v1, float v2)
{
res[0] = v0;
res[1] = v1;
res[2] = v2;
}
__device__ inline void store_result1(float* res, float3 const& v)
{
*res = 0.212671 * v.x + 0.715160 * v.y + 0.072169 * v.z;
}
__device__ inline void store_result1(float* res, float v0, float v1, float v2)
{
*res = 0.212671 * v0 + 0.715160 * v1 + 0.072169 * v2;
}
__device__ inline void store_result1(float* res, float s)
{
*res = s;
}
#define WRAP_AND_CROP_OR_RETURN_BLACK(val, inv_dim, wrap_mode, crop_vals, store_res_func) \
do { \
if ( (wrap_mode) == mi::neuraylib::TEX_WRAP_REPEAT && \
(crop_vals)[0] == 0.0f && (crop_vals)[1] == 1.0f ) { \
\
} \
else \
{ \
if ( (wrap_mode) == mi::neuraylib::TEX_WRAP_REPEAT ) \
val = val - floorf(val); \
else { \
if ( (wrap_mode) == mi::neuraylib::TEX_WRAP_CLIP && (val < 0.0f || val >= 1.0f) ) { \
store_res_func(result, 0.0f); \
return; \
} \
else if ( (wrap_mode) == mi::neuraylib::TEX_WRAP_MIRRORED_REPEAT ) { \
float floored_val = floorf(val); \
if ( (int(floored_val) & 1) != 0 ) \
val = 1.0f - (val - floored_val); \
else \
val = val - floored_val; \
} \
float inv_hdim = 0.5f * (inv_dim); \
val = fminf(fmaxf(val, inv_hdim), 1.f - inv_hdim); \
} \
val = val * ((crop_vals)[1] - (crop_vals)[0]) + (crop_vals)[0]; \
} \
} while ( 0 )
#ifdef USE_SMOOTHERSTEP_FILTER
#define APPLY_SMOOTHERSTEP_FILTER() \
do { \
u = u * tex.size.x + 0.5f; \
v = v * tex.size.y + 0.5f; \
\
float u_i = floorf(u), v_i = floorf(v); \
float u_f = u - u_i; \
float v_f = v - v_i; \
u_f = u_f * u_f * u_f * (u_f * (u_f * 6.f - 15.f) + 10.f); \
v_f = v_f * v_f * v_f * (v_f * (v_f * 6.f - 15.f) + 10.f); \
u = u_i + u_f; \
v = v_i + v_f; \
\
u = (u - 0.5f) * tex.inv_size.x; \
v = (v - 0.5f) * tex.inv_size.y; \
} while ( 0 )
#else
#define APPLY_SMOOTHERSTEP_FILTER()
#endif
extern "C" __device__ void tex_lookup_float4_2d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[2],
Tex_wrap_mode const wrap_u,
Tex_wrap_mode const wrap_v,
float const crop_u[2],
float const crop_v[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord[0], v = coord[1];
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result4);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result4);
APPLY_SMOOTHERSTEP_FILTER();
store_result4(result, tex2D<float4>(tex.filtered_object, u, v));
}
extern "C" __device__ void tex_lookup_deriv_float4_2d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
tct_deriv_float2 const *coord,
Tex_wrap_mode const wrap_u,
Tex_wrap_mode const wrap_v,
float const crop_u[2],
float const crop_v[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord->val.x, v = coord->val.y;
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result4);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result4);
APPLY_SMOOTHERSTEP_FILTER();
store_result4(result, tex2DGrad<float4>(tex.filtered_object, u, v, coord->dx, coord->dy));
}
extern "C" __device__ void tex_lookup_float3_2d(
float result[3],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[2],
Tex_wrap_mode const wrap_u,
Tex_wrap_mode const wrap_v,
float const crop_u[2],
float const crop_v[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result3(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord[0], v = coord[1];
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result3);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result3);
APPLY_SMOOTHERSTEP_FILTER();
store_result3(result, tex2D<float4>(tex.filtered_object, u, v));
}
extern "C" __device__ void tex_lookup_deriv_float3_2d(
float result[3],
Texture_handler_base const *self_base,
unsigned texture_idx,
tct_deriv_float2 const *coord,
Tex_wrap_mode const wrap_u,
Tex_wrap_mode const wrap_v,
float const crop_u[2],
float const crop_v[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result3(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord->val.x, v = coord->val.y;
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result3);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result3);
APPLY_SMOOTHERSTEP_FILTER();
store_result3(result, tex2DGrad<float4>(tex.filtered_object, u, v, coord->dx, coord->dy));
}
extern "C" __device__ void tex_texel_float4_2d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
int const coord[2],
int const [2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
store_result4(result, tex2D<float4>(
tex.unfiltered_object,
float(coord[0]) * tex.inv_size.x,
float(coord[1]) * tex.inv_size.y));
}
extern "C" __device__ void tex_lookup_float4_3d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[3],
Tex_wrap_mode wrap_u,
Tex_wrap_mode wrap_v,
Tex_wrap_mode wrap_w,
float const crop_u[2],
float const crop_v[2],
float const crop_w[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord[0], v = coord[1], w = coord[2];
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result4);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result4);
WRAP_AND_CROP_OR_RETURN_BLACK(w, tex.inv_size.z, wrap_w, crop_w, store_result4);
store_result4(result, tex3D<float4>(tex.filtered_object, u, v, w));
}
extern "C" __device__ void tex_lookup_float3_3d(
float result[3],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[3],
Tex_wrap_mode wrap_u,
Tex_wrap_mode wrap_v,
Tex_wrap_mode wrap_w,
float const crop_u[2],
float const crop_v[2],
float const crop_w[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result3(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
float u = coord[0], v = coord[1], w = coord[2];
WRAP_AND_CROP_OR_RETURN_BLACK(u, tex.inv_size.x, wrap_u, crop_u, store_result3);
WRAP_AND_CROP_OR_RETURN_BLACK(v, tex.inv_size.y, wrap_v, crop_v, store_result3);
WRAP_AND_CROP_OR_RETURN_BLACK(w, tex.inv_size.z, wrap_w, crop_w, store_result3);
store_result3(result, tex3D<float4>(tex.filtered_object, u, v, w));
}
extern "C" __device__ void tex_texel_float4_3d(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
const int coord[3])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
store_result4(result, tex3D<float4>(
tex.unfiltered_object,
float(coord[0]) * tex.inv_size.x,
float(coord[1]) * tex.inv_size.y,
float(coord[2]) * tex.inv_size.z));
}
extern "C" __device__ void tex_lookup_float4_cube(
float result[4],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[3])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result4(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
store_result4(result, texCubemap<float4>(tex.filtered_object, coord[0], coord[1], coord[2]));
}
extern "C" __device__ void tex_lookup_float3_cube(
float result[3],
Texture_handler_base const *self_base,
unsigned texture_idx,
float const coord[3])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
store_result3(result, 0.0f);
return;
}
Texture const &tex = self->textures[texture_idx - 1];
store_result3(result, texCubemap<float4>(tex.filtered_object, coord[0], coord[1], coord[2]));
}
extern "C" __device__ void tex_resolution_2d(
int result[2],
Texture_handler_base const *self_base,
unsigned texture_idx,
int const [2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if ( texture_idx == 0 || texture_idx - 1 >= self->num_textures ) {
result[0] = 0;
result[1] = 0;
return;
}
Texture const &tex = self->textures[texture_idx - 1];
result[0] = tex.size.x;
result[1] = tex.size.y;
}
__device__ inline unsigned sample_cdf(
const float* cdf,
unsigned cdf_size,
float xi)
{
unsigned li = 0;
unsigned ri = cdf_size - 1;
unsigned m = (li + ri) / 2;
while (ri > li)
{
if (xi < cdf[m])
ri = m;
else
li = m + 1;
m = (li + ri) / 2;
}
return m;
}
extern "C" __device__ float df_light_profile_evaluate(
Texture_handler_base const *self_base,
unsigned resource_idx,
float const theta_phi[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (resource_idx == 0 || resource_idx - 1 >= self->num_lightprofiles)
return 0.0f;
const Lightprofile& lp = self->lightprofiles[resource_idx - 1];
const float u = (theta_phi[0] - lp.theta_phi_start.x) *
lp.theta_phi_inv_delta.x / float(lp.angular_resolution.x - 1);
float phi = (theta_phi[1] > 0.0f) ? theta_phi[1] : (float(2.0 * M_PI) + theta_phi[1]);
phi = phi - lp.theta_phi_start.y -
floorf((phi - lp.theta_phi_start.y) * float(0.5 / M_PI)) * float(2.0 * M_PI);
const float v = phi * lp.theta_phi_inv_delta.y / float(lp.angular_resolution.y - 1);
if (u < 0.0f || u > 1.0f || v < 0.0f || v > 1.0f) return 0.0f;
return tex2D<float>(lp.eval_data, u, v) * lp.candela_multiplier;
}
extern "C" __device__ void df_light_profile_sample(
float result[3],
Texture_handler_base const *self_base,
unsigned resource_idx,
float const xi[3])
{
result[0] = -1.0f;
result[1] = -1.0f;
result[2] = 0.0f;
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (resource_idx == 0 || resource_idx - 1 >= self->num_lightprofiles)
return;
const Lightprofile& lp = self->lightprofiles[resource_idx - 1];
uint2 res = lp.angular_resolution;
float xi0 = xi[0];
const float* cdf_data_theta = lp.cdf_data;
unsigned idx_theta = sample_cdf(cdf_data_theta, res.x - 1, xi0);
float prob_theta = cdf_data_theta[idx_theta];
if (idx_theta > 0)
{
const float tmp = cdf_data_theta[idx_theta - 1];
prob_theta -= tmp;
xi0 -= tmp;
}
xi0 /= prob_theta;
float xi1 = xi[1];
const float* cdf_data_phi = cdf_data_theta + (res.x - 1)
+ (idx_theta * (res.y - 1));
const unsigned idx_phi = sample_cdf(cdf_data_phi, res.y - 1, xi1);
float prob_phi = cdf_data_phi[idx_phi];
if (idx_phi > 0)
{
const float tmp = cdf_data_phi[idx_phi - 1];
prob_phi -= tmp;
xi1 -= tmp;
}
xi1 /= prob_phi;
const float2 start = lp.theta_phi_start;
const float2 delta = lp.theta_phi_delta;
const float cos_theta_0 = cosf(start.x + float(idx_theta) * delta.x);
const float cos_theta_1 = cosf(start.x + float(idx_theta + 1u) * delta.x);
const float cos_theta = (1.0f - xi1) * cos_theta_0 + xi1 * cos_theta_1;
result[0] = acosf(cos_theta);
result[1] = start.y + (float(idx_phi) + xi0) * delta.y;
if (result[1] > float(2.0 * M_PI)) result[1] -= float(2.0 * M_PI);
if (result[1] > float(1.0 * M_PI)) result[1] = float(-2.0 * M_PI) + result[1];
result[2] = prob_theta * prob_phi / (delta.y * (cos_theta_0 - cos_theta_1));
}
extern "C" __device__ float df_light_profile_pdf(
Texture_handler_base const *self_base,
unsigned resource_idx,
float const theta_phi[2])
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (resource_idx == 0 || resource_idx - 1 >= self->num_lightprofiles)
return 0.0f;
const Lightprofile& lp = self->lightprofiles[resource_idx - 1];
const uint2 res = lp.angular_resolution;
const float* cdf_data_theta = lp.cdf_data;
const float theta = theta_phi[0] - lp.theta_phi_start.x;
const int idx_theta = int(theta * lp.theta_phi_inv_delta.x);
float phi = (theta_phi[1] > 0.0f) ? theta_phi[1] : (float(2.0 * M_PI) + theta_phi[1]);
phi = phi - lp.theta_phi_start.y -
floorf((phi - lp.theta_phi_start.y) * float(0.5 / M_PI)) * float(2.0 * M_PI);
const int idx_phi = int(phi * lp.theta_phi_inv_delta.y);
if (idx_theta < 0 || idx_theta > (res.x - 2) || idx_phi < 0 || idx_phi >(res.x - 2))
return 0.0f;
float prob_theta = cdf_data_theta[idx_theta];
if (idx_theta > 0)
{
const float tmp = cdf_data_theta[idx_theta - 1];
prob_theta -= tmp;
}
const float* cdf_data_phi = cdf_data_theta
+ (res.x - 1)
+ (idx_theta * (res.y - 1));
float prob_phi = cdf_data_phi[idx_phi];
if (idx_phi > 0)
{
const float tmp = cdf_data_phi[idx_phi - 1];
prob_phi -= tmp;
}
const float2 start = lp.theta_phi_start;
const float2 delta = lp.theta_phi_delta;
const float cos_theta_0 =
cos(start.x +
float(idx_theta) * delta.x);
const float cos_theta_1 =
cos(start.x +
float(idx_theta + 1u) * delta.x);
return prob_theta * prob_phi / (delta.y * (cos_theta_0 - cos_theta_1));
}
extern "C" __device__ void df_bsdf_measurement_resolution(
unsigned result[3],
Texture_handler_base const *self_base,
unsigned resource_idx,
Mbsdf_part part)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (resource_idx == 0 || resource_idx - 1 >= self->num_mbsdfs)
{
result[0] = 0;
result[1] = 0;
result[2] = 0;
return;
}
Mbsdf const &bm = self->mbsdfs[resource_idx - 1];
const unsigned part_index = static_cast<unsigned>(part);
if (bm.has_data[part_index] == 0)
{
result[0] = 0;
result[1] = 0;
result[2] = 0;
return;
}
result[0] = bm.angular_resolution[part_index].x;
result[1] = bm.angular_resolution[part_index].y;
result[2] = bm.num_channels[part_index];
}
__device__ inline float3 bsdf_compute_uvw(const float theta_phi_in[2],
const float theta_phi_out[2])
{
float u = theta_phi_out[1] - theta_phi_in[1];
if (u < 0.0) u += float(2.0 * M_PI);
if (u > float(1.0 * M_PI)) u = float(2.0 * M_PI) - u;
u *= M_ONE_OVER_PI;
const float v = theta_phi_out[0] * float(2.0 / M_PI);
const float w = theta_phi_in[0] * float(2.0 / M_PI);
return make_float3(u, v, w);
}
template<typename T>
__device__ inline T bsdf_measurement_lookup(const cudaTextureObject_t& eval_volume,
const float theta_phi_in[2],
const float theta_phi_out[2])
{
const float3 uvw = bsdf_compute_uvw(theta_phi_in, theta_phi_out);
return tex3D<T>(eval_volume, uvw.x, uvw.y, uvw.z);
}
extern "C" __device__ void df_bsdf_measurement_evaluate(
float result[3],
Texture_handler_base const *self_base,
unsigned resource_idx,
float const theta_phi_in[2],
float const theta_phi_out[2],
Mbsdf_part part)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (resource_idx == 0 || resource_idx - 1 >= self->num_mbsdfs)
{
store_result3(result, 0.0f);
return;
}
const Mbsdf& bm = self->mbsdfs[resource_idx - 1];
const unsigned part_index = static_cast<unsigned>(part);
if (bm.has_data[part_index] == 0)
{
store_result3(result, 0.0f);
return;
}
if (bm.num_channels[part_index] == 3)
{
const float4 sample = bsdf_measurement_lookup<float4>(
bm.eval_data[part_index], theta_phi_in, theta_phi_out);
store_result3(result, sample.x, sample.y, sample.z);
}
else
{
const float sample = bsdf_measurement_lookup<float>(
bm.eval_data[part_index], theta_phi_in, theta_phi_out);
store_result3(result, sample);
}
}
extern "C" __device__ void df_bsdf_measurement_sample(
float result[3],
Texture_handler_base const *self_base,
unsigned resource_idx,
float const theta_phi_out[2],
float const xi[3],
Mbsdf_part part)
{
result[0] = -1.0f;
result[1] = -1.0f;
result[2] = 0.0f;
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (resource_idx == 0 || resource_idx - 1 >= self->num_mbsdfs)
return;
const Mbsdf& bm = self->mbsdfs[resource_idx - 1];
unsigned part_index = static_cast<unsigned>(part);
if (bm.has_data[part_index] == 0)
return;
uint2 res = bm.angular_resolution[part_index];
const float* sample_data = bm.sample_data[part_index];
unsigned idx_theta_in = unsigned(theta_phi_out[0] * M_ONE_OVER_PI * 2.0f * float(res.x));
idx_theta_in =
min(idx_theta_in, res.x - 1);
float xi0 = xi[0];
const float* cdf_theta = sample_data + idx_theta_in * res.x;
unsigned idx_theta_out = sample_cdf(cdf_theta, res.x, xi0);
float prob_theta = cdf_theta[idx_theta_out];
if (idx_theta_out > 0)
{
const float tmp = cdf_theta[idx_theta_out - 1];
prob_theta -= tmp;
xi0 -= tmp;
}
xi0 /= prob_theta;
float xi1 = xi[1];
const float* cdf_phi = sample_data +
(res.x * res.x) +
(idx_theta_in * res.x + idx_theta_out) * res.y;
const bool flip = (xi1 > 0.5f);
if (flip)
xi1 = 1.0f - xi1;
xi1 *= 2.0f;
unsigned idx_phi_out = sample_cdf(cdf_phi, res.y, xi1);
float prob_phi = cdf_phi[idx_phi_out];
if (idx_phi_out > 0)
{
const float tmp = cdf_phi[idx_phi_out - 1];
prob_phi -= tmp;
xi1 -= tmp;
}
xi1 /= prob_phi;
const float2 inv_res = bm.inv_angular_resolution[part_index];
const float s_theta = float(0.5 * M_PI) * inv_res.x;
const float s_phi = float(1.0 * M_PI) * inv_res.y;
const float cos_theta_0 = cosf(float(idx_theta_out) * s_theta);
const float cos_theta_1 = cosf(float(idx_theta_out + 1u) * s_theta);
const float cos_theta = cos_theta_0 * (1.0f - xi1) + cos_theta_1 * xi1;
result[0] = acosf(cos_theta);
result[1] = (float(idx_phi_out) + xi0) * s_phi;
if (flip)
result[1] = float(2.0 * M_PI) - result[1];
result[1] += (theta_phi_out[1] > 0) ? theta_phi_out[1] : (float(2.0 * M_PI) + theta_phi_out[1]);
if (result[1] > float(2.0 * M_PI)) result[1] -= float(2.0 * M_PI);
if (result[1] > float(1.0 * M_PI)) result[1] = float(-2.0 * M_PI) + result[1];
result[2] = prob_theta * prob_phi * 0.5f
/ (s_phi * (cos_theta_0 - cos_theta_1));
}
extern "C" __device__ float df_bsdf_measurement_pdf(
Texture_handler_base const *self_base,
unsigned resource_idx,
float const theta_phi_in[2],
float const theta_phi_out[2],
Mbsdf_part part)
{
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (resource_idx == 0 || resource_idx - 1 >= self->num_mbsdfs)
return 0.0f;
const Mbsdf& bm = self->mbsdfs[resource_idx - 1];
unsigned part_index = static_cast<unsigned>(part);
if (bm.has_data[part_index] == 0)
return 0.0f;
const float* sample_data = bm.sample_data[part_index];
uint2 res = bm.angular_resolution[part_index];
float3 uvw = bsdf_compute_uvw(theta_phi_in, theta_phi_out);
unsigned idx_theta_in = unsigned(theta_phi_in[0] * M_ONE_OVER_PI * 2.0f * float(res.x));
unsigned idx_theta_out = unsigned(theta_phi_out[0] * M_ONE_OVER_PI * 2.0f * float(res.x));
unsigned idx_phi_out = unsigned(uvw.x * float(res.y));
idx_theta_in =
min(idx_theta_in, res.x - 1);
idx_theta_out =
min(idx_theta_out, res.x - 1);
idx_phi_out =
min(idx_phi_out, res.y - 1);
const float* cdf_theta = sample_data + idx_theta_in * res.x;
float prob_theta = cdf_theta[idx_theta_out];
if (idx_theta_out > 0)
{
const float tmp = cdf_theta[idx_theta_out - 1];
prob_theta -= tmp;
}
const float* cdf_phi = sample_data +
(res.x * res.x) +
(idx_theta_in * res.x + idx_theta_out) * res.y;
float prob_phi = cdf_phi[idx_phi_out];
if (idx_phi_out > 0)
{
const float tmp = cdf_phi[idx_phi_out - 1];
prob_phi -= tmp;
}
float2 inv_res = bm.inv_angular_resolution[part_index];
const float s_theta = float(0.5 * M_PI) * inv_res.x;
const float s_phi = float(1.0 * M_PI) * inv_res.y;
const float cos_theta_0 = cosf(float(idx_theta_out) * s_theta);
const float cos_theta_1 = cosf(float(idx_theta_out + 1u) * s_theta);
return prob_theta * prob_phi * 0.5f
/ (s_phi * (cos_theta_0 - cos_theta_1));
}
__device__ inline void df_bsdf_measurement_albedo(
float result[2],
Texture_handler const *self,
unsigned resource_idx,
float const theta_phi[2],
Mbsdf_part part)
{
const Mbsdf& bm = self->mbsdfs[resource_idx - 1];
const unsigned part_index = static_cast<unsigned>(part);
if (bm.has_data[part_index] == 0)
return;
const uint2 res = bm.angular_resolution[part_index];
unsigned idx_theta = unsigned(theta_phi[0] * float(2.0 / M_PI) * float(res.x));
idx_theta =
min(idx_theta, res.x - 1u);
result[0] = bm.albedo_data[part_index][idx_theta];
result[1] = bm.max_albedo[part_index];
}
extern "C" __device__ void df_bsdf_measurement_albedos(
float result[4],
Texture_handler_base const *self_base,
unsigned resource_idx,
float const theta_phi[2])
{
result[0] = 0.0f;
result[1] = 0.0f;
result[2] = 0.0f;
result[3] = 0.0f;
Texture_handler const *self = static_cast<Texture_handler const *>(self_base);
if (resource_idx == 0 || resource_idx - 1 >= self->num_mbsdfs)
return;
df_bsdf_measurement_albedo(
df_bsdf_measurement_albedo(
}
tex_lookup_float4_2d,
tex_lookup_float3_2d,
tex_texel_float4_2d,
tex_lookup_float4_3d,
tex_lookup_float3_3d,
tex_texel_float4_3d,
tex_lookup_float4_cube,
tex_lookup_float3_cube,
tex_resolution_2d,
df_light_profile_evaluate,
df_light_profile_sample,
df_light_profile_pdf,
df_bsdf_measurement_resolution,
df_bsdf_measurement_evaluate,
df_bsdf_measurement_sample,
df_bsdf_measurement_pdf,
df_bsdf_measurement_albedos
};
tex_lookup_deriv_float4_2d,
tex_lookup_deriv_float3_2d,
tex_texel_float4_2d,
tex_lookup_float4_3d,
tex_lookup_float3_3d,
tex_texel_float4_3d,
tex_lookup_float4_cube,
tex_lookup_float3_cube,
tex_resolution_2d,
df_light_profile_evaluate,
df_light_profile_sample,
df_light_profile_pdf,
df_bsdf_measurement_resolution,
df_bsdf_measurement_evaluate,
df_bsdf_measurement_sample,
df_bsdf_measurement_pdf,
df_bsdf_measurement_albedos
};
#endif // TEXTURE_SUPPORT_CUDA_H
Source Code Location: examples/mdl_sdk/shared/example_cuda_shared.h
#ifndef EXAMPLE_CUDA_SHARED_H
#define EXAMPLE_CUDA_SHARED_H
#include <string>
#include <vector>
#include <sstream>
#include <iostream>
#define _USE_MATH_DEFINES
#include "example_shared.h"
#include <cuda.h>
#ifdef OPENGL_INTEROP
#include <GL/glew.h>
#include <GLFW/glfw3.h>
#include <cudaGL.h>
#endif
#include <cuda_runtime.h>
#include <vector_functions.h>
struct Texture
{
explicit Texture(cudaTextureObject_t filtered_object,
cudaTextureObject_t unfiltered_object,
uint3 size)
: filtered_object(filtered_object)
, unfiltered_object(unfiltered_object)
, size(size)
, inv_size(make_float3(1.0f / size.x, 1.0f / size.y, 1.0f / size.z))
{}
cudaTextureObject_t filtered_object;
cudaTextureObject_t unfiltered_object;
uint3 size;
float3 inv_size;
};
struct Mbsdf
{
explicit Mbsdf()
{
for (unsigned i = 0; i < 2; ++i) {
has_data[i] = 0u;
eval_data[i] = 0;
sample_data[i] = 0;
albedo_data[i] = 0;
this->max_albedo[i] = 0.0f;
angular_resolution[i] = make_uint2(0u, 0u);
inv_angular_resolution[i] = make_float2(0.0f, 0.0f);
num_channels[i] = 0;
}
}
const uint2& angular_resolution,
unsigned num_channels)
{
unsigned part_idx = static_cast<unsigned>(part);
this->has_data[part_idx] = 1u;
this->angular_resolution[part_idx] = angular_resolution;
this->inv_angular_resolution[part_idx] = make_float2(1.0f / float(angular_resolution.x),
1.0f / float(angular_resolution.y));
this->num_channels[part_idx] = num_channels;
}
unsigned has_data[2];
cudaTextureObject_t eval_data[2];
float max_albedo[2];
float* sample_data[2];
float* albedo_data[2];
uint2 angular_resolution[2];
float2 inv_angular_resolution[2];
unsigned num_channels[2];
};
struct Lightprofile
{
explicit Lightprofile(
uint2 angular_resolution = make_uint2(0, 0),
float2 theta_phi_start = make_float2(0.0f, 0.0f),
float2 theta_phi_delta = make_float2(0.0f, 0.0f),
float candela_multiplier = 0.0f,
float total_power = 0.0f,
cudaTextureObject_t eval_data = 0,
float *cdf_data = nullptr)
: angular_resolution(angular_resolution)
, theta_phi_start(theta_phi_start)
, theta_phi_delta(theta_phi_delta)
, theta_phi_inv_delta(make_float2(0.0f, 0.0f))
, candela_multiplier(candela_multiplier)
, total_power(total_power)
, eval_data(eval_data)
, cdf_data(cdf_data)
{
theta_phi_inv_delta.x = theta_phi_delta.x ? (1.f / theta_phi_delta.x) : 0.f;
theta_phi_inv_delta.y = theta_phi_delta.y ? (1.f / theta_phi_delta.y) : 0.f;
}
uint2 angular_resolution;
float2 theta_phi_start;
float2 theta_phi_delta;
float2 theta_phi_inv_delta;
float candela_multiplier;
float total_power;
cudaTextureObject_t eval_data;
float* cdf_data;
};
struct Target_code_data
{
Target_code_data(
size_t num_textures,
CUdeviceptr textures,
size_t num_mbsdfs,
CUdeviceptr mbsdfs,
size_t num_lightprofiles,
CUdeviceptr lightprofiles,
CUdeviceptr ro_data_segment)
: num_textures(num_textures)
, textures(textures)
, num_mbsdfs(num_mbsdfs)
, mbsdfs(mbsdfs)
, num_lightprofiles(num_lightprofiles)
, lightprofiles(lightprofiles)
, ro_data_segment(ro_data_segment)
{}
size_t num_textures;
CUdeviceptr textures;
size_t num_mbsdfs;
CUdeviceptr mbsdfs;
size_t num_lightprofiles;
CUdeviceptr lightprofiles;
CUdeviceptr ro_data_segment;
};
template <typename T>
{
stream << val;
}
#define check_cuda_success(expr) \
do { \
int err = (expr); \
if (err != 0) { \
fprintf(stderr, "CUDA error %d in file %s, line %u: \"%s\".\n", \
err, __FILE__, __LINE__, #expr); \
keep_console_open(); \
cudaDeviceReset(); \
exit(EXIT_FAILURE); \
} \
} while (false)
CUcontext init_cuda(
#ifdef OPENGL_INTEROP
const bool opengl_interop
#endif
)
{
CUdevice cu_device;
CUcontext cu_context;
check_cuda_success(cuInit(0));
#if defined(OPENGL_INTEROP) && !defined(__APPLE__)
if (opengl_interop) {
unsigned int num_cu_devices;
check_cuda_success(cuGLGetDevices(&num_cu_devices, &cu_device, 1, CU_GL_DEVICE_LIST_ALL));
}
else
#endif
check_cuda_success(cuDeviceGet(&cu_device, 0));
check_cuda_success(cuCtxCreate(&cu_context, 0, cu_device));
cudaDeviceSetLimit(cudaLimitPrintfFifoSize, 16 * 1024 * 1024);
return cu_context;
}
void uninit_cuda(CUcontext cuda_context)
{
check_cuda_success(cuCtxDestroy(cuda_context));
}
template<typename T> struct Resource_deleter {
};
template<> struct Resource_deleter<cudaArray_t> {
void operator()(cudaArray_t res) { check_cuda_success(cudaFreeArray(res)); }
};
template<> struct Resource_deleter<cudaMipmappedArray_t> {
void operator()(cudaMipmappedArray_t res) { check_cuda_success(cudaFreeMipmappedArray(res)); }
};
template<> struct Resource_deleter<Texture> {
void operator()(Texture &res) {
check_cuda_success(cudaDestroyTextureObject(res.filtered_object));
check_cuda_success(cudaDestroyTextureObject(res.unfiltered_object));
}
};
template<> struct Resource_deleter<Mbsdf> {
void operator()(Mbsdf &res) {
for (size_t i = 0; i < 2; ++i) {
if (res.has_data[i] != 0u) {
check_cuda_success(cudaDestroyTextureObject(res.eval_data[i]));
check_cuda_success(cuMemFree(reinterpret_cast<CUdeviceptr>(res.sample_data[i])));
check_cuda_success(cuMemFree(reinterpret_cast<CUdeviceptr>(res.albedo_data[i])));
}
}
}
};
template<> struct Resource_deleter<Lightprofile> {
void operator()(Lightprofile res) {
if (res.cdf_data)
check_cuda_success(cuMemFree((CUdeviceptr)res.cdf_data));
}
};
template<> struct Resource_deleter<Target_code_data> {
void operator()(Target_code_data &res) {
if (res.textures)
check_cuda_success(cuMemFree(res.textures));
if (res.ro_data_segment)
check_cuda_success(cuMemFree(res.ro_data_segment));
}
};
template<> struct Resource_deleter<CUdeviceptr> {
void operator()(CUdeviceptr res) {
if (res != 0)
check_cuda_success(cuMemFree(res));
}
};
template<typename T, typename D = Resource_deleter<T> >
struct Resource_handle {
Resource_handle(T res) : m_res(res) {}
~Resource_handle() {
D deleter;
deleter(m_res);
}
T &get() { return m_res; }
T const &get() const { return m_res; }
void set(T res) { m_res = res; }
private:
Resource_handle(Resource_handle const &);
Resource_handle &operator=(Resource_handle const &);
private:
T m_res;
};
template<typename T, typename C = std::vector<T>, typename D = Resource_deleter<T> >
struct Resource_container {
Resource_container() : m_cont() {}
~Resource_container() {
D deleter;
typedef typename C::iterator I;
for (I it(m_cont.begin()), end(m_cont.end()); it != end; ++it) {
T &r = *it;
deleter(r);
}
}
C
const &
operator*()
const {
return m_cont; }
C *operator->() { return &m_cont; }
C const *operator->() const { return &m_cont; }
private:
Resource_container(Resource_container const &);
Resource_container &operator=(Resource_container const &);
private:
C m_cont;
};
CUdeviceptr gpu_mem_dup(void const *data, size_t size)
{
CUdeviceptr device_ptr;
check_cuda_success(cuMemAlloc(&device_ptr, size));
check_cuda_success(cuMemcpyHtoD(device_ptr, data, size));
return device_ptr;
}
template <typename T>
CUdeviceptr gpu_mem_dup(Resource_handle<T> const *data, size_t size)
{
return gpu_mem_dup((void *)data->get(), size);
}
template<typename T>
{
return gpu_mem_dup(&data[0], data.
size() *
sizeof(T));
}
template<typename T, typename C>
CUdeviceptr gpu_mem_dup(Resource_container<T,C> const &cont)
{
return gpu_mem_dup(*cont);
}
class Material_gpu_context
{
public:
Material_gpu_context(bool enable_derivatives)
: m_enable_derivatives(enable_derivatives)
, m_device_target_code_data_list(0)
, m_device_target_argument_block_list(0)
{
m_target_argument_block_list->push_back(0);
}
bool prepare_target_code_data(
CUdeviceptr get_device_target_code_data_list();
CUdeviceptr get_device_target_argument_block_list();
CUdeviceptr get_device_target_argument_block(size_t i)
{
if (i + 1 >= m_target_argument_block_list->size())
return 0;
return (*m_target_argument_block_list)[i + 1];
}
size_t get_argument_block_count() const
{
return m_own_arg_blocks.size();
}
size_t get_bsdf_argument_block_index(size_t i) const
{
if (i >= m_bsdf_arg_block_indices.size()) return size_t(~0);
return m_bsdf_arg_block_indices[i];
}
{
if (i >= m_own_arg_blocks.size())
return m_own_arg_blocks[i];
}
{
if (i >= m_arg_block_layouts.size())
return m_arg_block_layouts[i];
}
void update_device_argument_block(size_t i);
private:
bool prepare_texture(
bool prepare_mbsdf(
bool prepare_lightprofile(
bool m_enable_derivatives;
Resource_handle<CUdeviceptr> m_device_target_code_data_list;
Resource_container<Target_code_data> m_target_code_data_list;
Resource_handle<CUdeviceptr> m_device_target_argument_block_list;
Resource_container<CUdeviceptr> m_target_argument_block_list;
Resource_container<Texture> m_all_textures;
Resource_container<Mbsdf> m_all_mbsdfs;
Resource_container<Lightprofile> m_all_lightprofiles;
Resource_container<cudaArray_t> m_all_texture_arrays;
Resource_container<cudaMipmappedArray_t> m_all_texture_mipmapped_arrays;
};
CUdeviceptr Material_gpu_context::get_device_target_code_data_list()
{
if (!m_device_target_code_data_list.get())
m_device_target_code_data_list.set(gpu_mem_dup(m_target_code_data_list));
return m_device_target_code_data_list.get();
}
CUdeviceptr Material_gpu_context::get_device_target_argument_block_list()
{
if (!m_device_target_argument_block_list.get())
m_device_target_argument_block_list.set(gpu_mem_dup(m_target_argument_block_list));
return m_device_target_argument_block_list.get();
}
void Material_gpu_context::copy_canvas_to_cuda_array(
cudaArray_t device_array,
{
check_cuda_success(cudaMemcpyToArray(
device_array,
0,
0,
data,
cudaMemcpyHostToDevice));
}
bool Material_gpu_context::prepare_texture(
{
char const *image_type = image->
get_type();
if (image->is_uvtile()) {
return false;
}
return false;
}
if (texture->get_effective_gamma() != 1.0f) {
image_api->
convert(canvas.get(),
"Color"));
gamma_canvas->set_gamma(texture->get_effective_gamma());
canvas = gamma_canvas;
}
else if (
strcmp(image_type,
"Color") != 0 &&
strcmp(image_type,
"Float32<4>") != 0) {
canvas = image_api->
convert(canvas.get(),
"Color");
}
cudaChannelFormatDesc channel_desc = cudaCreateChannelDesc<float4>();
cudaResourceDesc res_desc;
memset(&res_desc, 0,
sizeof(res_desc));
if (texture_shape == mi::neuraylib::ITarget_code::Texture_shape_cube ||
texture_shape == mi::neuraylib::ITarget_code::Texture_shape_3d) {
if (texture_shape == mi::neuraylib::ITarget_code::Texture_shape_cube &&
tex_layers != 6) {
std::cerr <<
"Invalid number of layers (" << tex_layers
<<
"), cubemaps must have 6 layers!" <<
std::endl;
return false;
}
cudaExtent extent = make_cudaExtent(tex_width, tex_height, tex_layers);
cudaArray_t device_tex_array;
check_cuda_success(cudaMalloc3DArray(
&device_tex_array, &channel_desc, extent,
texture_shape == mi::neuraylib::ITarget_code::Texture_shape_cube ?
cudaArrayCubemap : 0));
cudaMemcpy3DParms copy_params;
memset(©_params, 0,
sizeof(copy_params));
copy_params.dstArray = device_tex_array;
copy_params.extent = make_cudaExtent(tex_width, tex_height, 1);
copy_params.kind = cudaMemcpyHostToDevice;
for (
mi::Uint32 layer = 0; layer < tex_layers; ++layer) {
float const *data = static_cast<float const *>(tile->get_data());
copy_params.srcPtr = make_cudaPitchedPtr(
const_cast<float *>(data), tex_width * sizeof(float) * 4,
tex_width, tex_height);
copy_params.dstPos = make_cudaPos(0, 0, layer);
check_cuda_success(cudaMemcpy3D(©_params));
}
res_desc.resType = cudaResourceTypeArray;
res_desc.res.array.array = device_tex_array;
m_all_texture_arrays->push_back(device_tex_array);
} else if (m_enable_derivatives) {
cudaExtent extent = make_cudaExtent(tex_width, tex_height, 0);
cudaMipmappedArray_t device_tex_miparray;
check_cuda_success(cudaMallocMipmappedArray(
&device_tex_miparray, &channel_desc, extent, num_levels));
for (
mi::Uint32 level = 0; level < num_levels; ++level) {
if (level == 0)
level_canvas = canvas;
else {
}
cudaArray_t device_level_array;
cudaGetMipmappedArrayLevel(&device_level_array, device_tex_miparray, level);
copy_canvas_to_cuda_array(device_level_array, level_canvas.
get());
}
res_desc.resType = cudaResourceTypeMipmappedArray;
res_desc.res.mipmap.mipmap = device_tex_miparray;
m_all_texture_mipmapped_arrays->push_back(device_tex_miparray);
} else {
cudaArray_t device_tex_array;
check_cuda_success(cudaMallocArray(
&device_tex_array, &channel_desc, tex_width, tex_height));
copy_canvas_to_cuda_array(device_tex_array, canvas.get());
res_desc.resType = cudaResourceTypeArray;
res_desc.res.array.array = device_tex_array;
m_all_texture_arrays->push_back(device_tex_array);
}
cudaTextureAddressMode addr_mode =
texture_shape == mi::neuraylib::ITarget_code::Texture_shape_cube
? cudaAddressModeClamp
: cudaAddressModeWrap;
cudaTextureDesc tex_desc;
memset(&tex_desc, 0,
sizeof(tex_desc));
tex_desc.addressMode[0] = addr_mode;
tex_desc.addressMode[1] = addr_mode;
tex_desc.addressMode[2] = addr_mode;
tex_desc.filterMode = cudaFilterModeLinear;
tex_desc.readMode = cudaReadModeElementType;
tex_desc.normalizedCoords = 1;
if (res_desc.resType == cudaResourceTypeMipmappedArray) {
tex_desc.mipmapFilterMode = cudaFilterModeLinear;
tex_desc.maxAnisotropy = 16;
tex_desc.minMipmapLevelClamp = 0.f;
tex_desc.maxMipmapLevelClamp = 1000.f;
}
cudaTextureObject_t tex_obj = 0;
check_cuda_success(cudaCreateTextureObject(&tex_obj, &res_desc, &tex_desc, nullptr));
cudaTextureObject_t tex_obj_unfilt = 0;
if (texture_shape != mi::neuraylib::ITarget_code::Texture_shape_cube) {
tex_desc.addressMode[0] = cudaAddressModeBorder;
tex_desc.addressMode[1] = cudaAddressModeBorder;
tex_desc.addressMode[2] = cudaAddressModeBorder;
tex_desc.filterMode = cudaFilterModePoint;
check_cuda_success(cudaCreateTextureObject(
&tex_obj_unfilt, &res_desc, &tex_desc, nullptr));
}
tex_obj,
tex_obj_unfilt,
make_uint3(tex_width, tex_height, tex_layers)));
m_all_textures->push_back(textures.
back());
return true;
}
namespace
{
{
switch (part)
{
break;
break;
}
if (!dataset)
return true;
uint2 res;
mbsdf_cuda_representation.Add(part, res, num_channels);
const unsigned int cdf_theta_size = res.x * res.x;
const unsigned sample_data_size = cdf_theta_size + cdf_theta_size * res.y;
float* sample_data = new float[sample_data_size];
float* albedo_data = new float[res.x];
float* sample_data_theta = sample_data;
float* sample_data_phi = sample_data + cdf_theta_size;
const float s_theta = (float) (M_PI * 0.5) / float(res.x);
const float s_phi = (float) (M_PI) / float(res.y);
float max_albedo = 0.0f;
for (unsigned int t_in = 0; t_in < res.x; ++t_in)
{
float sum_theta = 0.0f;
float sintheta0_sqd = 0.0f;
for (unsigned int t_out = 0; t_out < res.x; ++t_out)
{
const float sintheta1 = sinf(float(t_out + 1) * s_theta);
const float sintheta1_sqd = sintheta1 * sintheta1;
const float mu = (sintheta1_sqd - sintheta0_sqd) * s_phi * 0.5f;
sintheta0_sqd = sintheta1_sqd;
const unsigned int offset_phi = (t_in * res.x + t_out) * res.y;
const unsigned int offset_phi2 = (t_out * res.x + t_in) * res.y;
float sum_phi = 0.0f;
for (unsigned int p_out = 0; p_out < res.y; ++p_out)
{
const unsigned int idx = offset_phi + p_out;
const unsigned int idx2 = offset_phi2 + p_out;
float value = 0.0f;
if (num_channels == 3)
{
value =
fmax(fmaxf(src_data[3 * idx + 0], src_data[3 * idx + 1]),
fmaxf(src_data[3 * idx + 2], 0.0f))
+
fmax(fmaxf(src_data[3 * idx2 + 0], src_data[3 * idx2 + 1]),
fmaxf(src_data[3 * idx2 + 2], 0.0f));
}
else
{
value = fmaxf(src_data[idx], 0.0f) + fmaxf(src_data[idx2], 0.0f);
}
sum_phi += value * mu;
sample_data_phi[idx] = sum_phi;
}
for (unsigned int p_out = 0; p_out < res.y; ++p_out)
{
const unsigned int idx = offset_phi + p_out;
sample_data_phi[idx] = sample_data_phi[idx] / sum_phi;
}
sum_theta += sum_phi;
sample_data_theta[t_in * res.x + t_out] = sum_theta;
}
if (sum_theta > max_albedo)
max_albedo = sum_theta;
albedo_data[t_in] = sum_theta;
for (unsigned int t_out = 0; t_out < res.x; ++t_out)
{
const unsigned int idx = t_in * res.x + t_out;
sample_data_theta[idx] = sample_data_theta[idx] / sum_theta;
}
}
CUdeviceptr sample_obj = 0;
check_cuda_success(cuMemAlloc(&sample_obj, sample_data_size * sizeof(float)));
check_cuda_success(cuMemcpyHtoD(sample_obj, sample_data, sample_data_size * sizeof(float)));
delete[] sample_data;
CUdeviceptr albedo_obj = 0;
check_cuda_success(cuMemAlloc(&albedo_obj, res.x * sizeof(float)));
check_cuda_success(cuMemcpyHtoD(albedo_obj, albedo_data, res.x * sizeof(float)));
delete[] albedo_data;
mbsdf_cuda_representation.sample_data[part] = reinterpret_cast<float*>(sample_obj);
mbsdf_cuda_representation.albedo_data[part] = reinterpret_cast<float*>(albedo_obj);
mbsdf_cuda_representation.max_albedo[part] = max_albedo;
unsigned lookup_channels = (num_channels == 3) ? 4 : 1;
float* lookup_data = new float[lookup_channels * res.y * res.x * res.x];
for (unsigned int t_in = 0; t_in < res.x; ++t_in)
{
for (unsigned int t_out = 0; t_out < res.x; ++t_out)
{
const unsigned int offset_phi = (t_in * res.x + t_out) * res.y;
const unsigned int offset_phi2 = (t_out * res.x + t_in) * res.y;
for (unsigned int p_out = 0; p_out < res.y; ++p_out)
{
const unsigned int idx = offset_phi + p_out;
const unsigned int idx2 = offset_phi2 + p_out;
if (num_channels == 3)
{
lookup_data[4*idx+0] = (src_data[3*idx+0] + src_data[3*idx2+0]) * 0.5f;
lookup_data[4*idx+1] = (src_data[3*idx+1] + src_data[3*idx2+1]) * 0.5f;
lookup_data[4*idx+2] = (src_data[3*idx+2] + src_data[3*idx2+2]) * 0.5f;
lookup_data[4*idx+3] = 1.0f;
}
else
{
lookup_data[idx] = (src_data[idx] + src_data[idx2]) * 0.5f;
}
}
}
}
cudaArray_t device_mbsdf_data;
cudaChannelFormatDesc channel_desc = (num_channels == 3
? cudaCreateChannelDesc<float4>()
: cudaCreateChannelDesc<float>());
cudaExtent extent = make_cudaExtent(res.y, res.x, res.x);
check_cuda_success(cudaMalloc3DArray(&device_mbsdf_data, &channel_desc, extent, 0));
cudaMemcpy3DParms copy_params;
memset(©_params, 0,
sizeof(copy_params));
copy_params.srcPtr = make_cudaPitchedPtr(
(void*)(lookup_data),
res.y * lookup_channels * sizeof(float),
res.y,
res.x);
copy_params.dstArray = device_mbsdf_data;
copy_params.extent = extent;
copy_params.kind = cudaMemcpyHostToDevice;
check_cuda_success(cudaMemcpy3D(©_params));
delete[] lookup_data;
cudaResourceDesc texRes;
memset(&texRes, 0,
sizeof(cudaResourceDesc));
texRes.resType = cudaResourceTypeArray;
texRes.res.array.array = device_mbsdf_data;
cudaTextureDesc texDescr;
memset(&texDescr, 0,
sizeof(cudaTextureDesc));
texDescr.normalizedCoords = 1;
texDescr.filterMode = cudaFilterModeLinear;
texDescr.addressMode[0] = cudaAddressModeClamp;
texDescr.addressMode[1] = cudaAddressModeClamp;
texDescr.addressMode[2] = cudaAddressModeClamp;
texDescr.readMode = cudaReadModeElementType;
cudaTextureObject_t eval_tex_obj;
check_cuda_success(cudaCreateTextureObject(&eval_tex_obj, &texRes, &texDescr, nullptr));
mbsdf_cuda_representation.eval_data[part] = eval_tex_obj;
return true;
}
}
bool Material_gpu_context::prepare_mbsdf(
{
Mbsdf mbsdf_cuda;
return false;
return false;
m_all_mbsdfs->push_back(mbsdfs.
back());
return true;
}
bool Material_gpu_context::prepare_lightprofile(
{
uint2 res = make_uint2(lprof_nr->get_resolution_theta(), lprof_nr->get_resolution_phi());
float2 start = make_float2(lprof_nr->get_theta(0), lprof_nr->get_phi(0));
float2 delta = make_float2(lprof_nr->get_theta(1) - start.x, lprof_nr->get_phi(1) - start.y);
const float* data = lprof_nr->
get_data();
size_t cdf_data_size = (res.x - 1) + (res.x - 1) * (res.y - 1);
float* cdf_data = new float[cdf_data_size];
float debug_total_erea = 0.0f;
float sum_theta = 0.0f;
float total_power = 0.0f;
float cos_theta0 = cosf(start.x);
for (unsigned int t = 0; t < res.x - 1; ++t)
{
const float cos_theta1 = cosf(start.x + float(t + 1) * delta.x);
const float mu = cos_theta0 - cos_theta1;
cos_theta0 = cos_theta1;
float* cdf_data_phi = cdf_data + (res.x - 1) + t * (res.y - 1);
float sum_phi = 0.0f;
for (unsigned int p = 0; p < res.y - 1; ++p)
{
float value = data[p * res.x + t]
+ data[p * res.x + t + 1]
+ data[(p + 1) * res.x + t]
+ data[(p + 1) * res.x + t + 1];
sum_phi += value * mu;
cdf_data_phi[p] = sum_phi;
debug_total_erea += mu;
}
for (unsigned int p = 0; p < res.y - 2; ++p)
cdf_data_phi[p] = sum_phi ? (cdf_data_phi[p] / sum_phi) : 0.0f;
cdf_data_phi[res.y - 2] = 1.0f;
sum_theta += sum_phi;
cdf_data[t] = sum_theta;
}
total_power = sum_theta * 0.25f * delta.y;
for (unsigned int t = 0; t < res.x - 2; ++t)
cdf_data[t] = sum_theta ? (cdf_data[t] / sum_theta) : cdf_data[t];
cdf_data[res.x - 2] = 1.0f;
CUdeviceptr cdf_data_obj = 0;
check_cuda_success(cuMemAlloc(&cdf_data_obj, cdf_data_size * sizeof(float)));
check_cuda_success(cuMemcpyHtoD(cdf_data_obj, cdf_data, cdf_data_size * sizeof(float)));
delete[] cdf_data;
cudaArray_t device_lightprofile_data;
cudaChannelFormatDesc channel_desc = cudaCreateChannelDesc<float>();
check_cuda_success(cudaMallocArray(&device_lightprofile_data, &channel_desc, res.x, res.y));
check_cuda_success(cudaMemcpyToArray(device_lightprofile_data, 0, 0, data,
res.x * res.y * sizeof(float), cudaMemcpyHostToDevice));
cudaResourceDesc res_desc;
memset(&res_desc, 0,
sizeof(res_desc));
res_desc.resType = cudaResourceTypeArray;
res_desc.res.array.array = device_lightprofile_data;
cudaTextureDesc tex_desc;
memset(&tex_desc, 0,
sizeof(tex_desc));
tex_desc.addressMode[0] = cudaAddressModeClamp;
tex_desc.addressMode[1] = cudaAddressModeClamp;
tex_desc.addressMode[2] = cudaAddressModeClamp;
tex_desc.borderColor[0] = 1.0f;
tex_desc.borderColor[1] = 1.0f;
tex_desc.borderColor[2] = 1.0f;
tex_desc.borderColor[3] = 1.0f;
tex_desc.filterMode = cudaFilterModeLinear;
tex_desc.readMode = cudaReadModeElementType;
tex_desc.normalizedCoords = 1;
cudaTextureObject_t tex_obj = 0;
check_cuda_success(cudaCreateTextureObject(&tex_obj, &res_desc, &tex_desc, nullptr));
double multiplier = lprof_nr->get_candela_multiplier();
Lightprofile lprof(
res,
start,
delta,
float(multiplier),
float(total_power * multiplier),
tex_obj,
reinterpret_cast<float*>(cdf_data_obj));
m_all_lightprofiles->push_back(lightprofiles.
back());
return true;
}
bool Material_gpu_context::prepare_target_code_data(
{
check_success(m_device_target_code_data_list.get() == 0);
CUdeviceptr device_ro_data = 0;
device_ro_data = gpu_mem_dup(
}
CUdeviceptr device_textures = 0;
if (num_textures > 1) {
for (
mi::Size i = 1; i < num_textures; ++i) {
if (!prepare_texture(
transaction, image_api, target_code, i, textures))
return false;
}
device_textures = gpu_mem_dup(textures);
}
CUdeviceptr device_mbsdfs = 0;
if (num_mbsdfs > 1)
{
for (
mi::Size i = 1; i < num_mbsdfs; ++i)
{
if (!prepare_mbsdf(
transaction, target_code, i, mbsdfs))
return false;
}
device_mbsdfs = gpu_mem_dup(mbsdfs);
}
CUdeviceptr device_lightprofiles = 0;
if (num_lightprofiles > 1)
{
for (
mi::Size i = 1; i < num_lightprofiles; ++i)
{
if (!prepare_lightprofile(
transaction, target_code, i, lightprofiles))
return false;
}
device_lightprofiles = gpu_mem_dup(lightprofiles);
}
(*m_target_code_data_list).push_back(
Target_code_data(num_textures, device_textures,
num_mbsdfs, device_mbsdfs,
num_lightprofiles, device_lightprofiles,
device_ro_data));
CUdeviceptr dev_block = gpu_mem_dup(arg_block->get_data(), arg_block->get_size());
m_target_argument_block_list->push_back(dev_block);
m_arg_block_layouts.push_back(
}
if (kind != mi::neuraylib::ITarget_code::FK_DF_INIT ||
df_kind != mi::neuraylib::ITarget_code::DK_BSDF)
continue;
m_bsdf_arg_block_indices.push_back(
}
return true;
}
void Material_gpu_context::update_device_argument_block(size_t i)
{
CUdeviceptr device_ptr = get_device_target_argument_block(i);
if (device_ptr == 0) return;
check_cuda_success(cuMemcpyHtoD(
device_ptr, arg_block->get_data(), arg_block->get_size()));
}
class Material_compiler {
public:
Material_compiler(
unsigned num_texture_results,
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
bool use_df_interpreter,
#endif
bool enable_derivatives);
bool add_material_subexpr(
const char* path,
const char* fname,
bool class_compilation=false);
bool add_material_df(
const char* path,
const char* base_fname,
bool class_compilation=false);
bool add_material(
mi::neuraylib::Target_function_description* function_descriptions,
bool class_compilation = false);
&get_material_defs()
{
return m_material_defs;
}
{
return m_compiled_materials;
}
private:
bool class_compilation);
private:
};
Material_compiler::Material_compiler(
unsigned num_texture_results,
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
bool use_df_interpreter,
#endif
bool enable_derivatives)
, m_be_cuda_ptx(mdl_compiler->get_backend(mi::neuraylib::IMdl_compiler::MB_CUDA_PTX))
, m_context(mdl_factory->create_execution_context())
, m_link_unit()
{
check_success(m_be_cuda_ptx->set_option("num_texture_spaces", "1") == 0);
if (enable_derivatives) {
check_success(m_be_cuda_ptx->set_option("texture_runtime_with_derivs", "on") == 0);
}
check_success(m_be_cuda_ptx->set_option("tex_lookup_call_mode", "direct_call") == 0);
check_success(m_be_cuda_ptx->set_option(
"num_texture_results",
to_string(num_texture_results).c_str()) == 0);
#if !defined(MDL_SOURCE_RELEASE) && defined(MDL_ENABLE_INTERPRETER)
if (use_df_interpreter) {
check_success(m_be_cuda_ptx->set_option("enable_df_interpreter", "on") == 0);
}
#endif
}
{
size_t p = material_name.
rfind(
"::");
return material_name.
substr(0, p);
}
{
size_t p = material_name.
rfind(
"::");
if (p == std::string::npos)
return material_name;
return material_name.
substr(p + 2, material_name.
size() - p);
}
{
check_success(m_mdl_compiler->load_module(m_transaction.get(), module_name.
c_str()) >= 0);
const char *prefix = (module_name.
find(
"::") == 0) ?
"mdl" :
"mdl::";
mi::Size num_materials = module->get_material_count();
for (
mi::Size i = 0; i < num_materials; ++i) {
material_names[i] = module->get_material(i);
}
return material_names;
}
{
std::string module_name = get_module_name(material_name);
check_success(m_mdl_compiler->load_module(m_transaction.get(), module_name.
c_str(), m_context.get()) >= 0);
print_messages(m_context.get());
const char *prefix = (material_name.
find(
"::") == 0) ?
"mdl" :
"mdl::";
material_db_name.
c_str()));
check_success(material_definition);
m_material_defs.push_back(material_definition);
material_definition->create_material_instance(0, &result));
check_success(result == 0);
return material_instance.get();
}
bool class_compilation)
{
check_success(print_messages(m_context.get()));
m_compiled_materials.push_back(compiled_material);
return compiled_material.get();
}
{
m_be_cuda_ptx->translate_link_unit(m_link_unit.get(), m_context.get()));
check_success(print_messages(m_context.get()));
check_success(code_cuda_ptx);
#ifdef DUMP_PTX
#endif
return code_cuda_ptx;
}
bool Material_compiler::add_material_subexpr(
const char* path,
const char* fname,
bool class_compilation)
{
create_material_instance(material_name.
c_str()));
compile_material_instance(material_instance.get(), class_compilation));
m_link_unit->add_material_expression(compiled_material.get(), path, fname, m_context.get());
return print_messages(m_context.get());
}
bool Material_compiler::add_material_df(
const char* path,
const char* base_fname,
bool class_compilation)
{
create_material_instance(material_name.
c_str()));
compile_material_instance(material_instance.get(), class_compilation));
m_link_unit->add_material_df(compiled_material.get(), path, base_fname, m_context.get());
return print_messages(m_context.get());
}
bool Material_compiler::add_material(
mi::neuraylib::Target_function_description* function_descriptions,
bool class_compilation)
{
create_material_instance(material_name.
c_str()));
compile_material_instance(material_instance.get(), class_compilation));
m_link_unit->add_material(
compiled_material.get(), function_descriptions, description_count,
m_context.get());
bool res = print_messages(m_context.get());
if (res)
for (size_t i = 0; i < description_count; ++i)
function_descriptions[i].argument_block_index++;
return res;
}
void print_array_u32(
{
str += ".visible .const .align 4 .u32 " + name + "[";
if (count == 0) {
str += "1] = { 0 };\n";
} else {
str +=
to_string(count) +
"] = { " + content +
" };\n";
}
}
void print_array_func(
{
str += ".visible .const .align 8 .u64 " + name + "[";
if (count == 0) {
str += "1] = { dummy_func };\n";
} else {
str +=
to_string(count) +
"] = { " + content +
" };\n";
}
}
{
".version 4.0\n"
".target sm_20\n"
".address_size 64\n";
src += ".func dummy_func() { ret; }\n";
unsigned f_count = 0;
for (
size_t tc_index = 0, num = target_codes.
size(); tc_index < num; ++tc_index)
{
target_codes[tc_index];
tc_offsets += ", ";
func_index < func_count; ++func_index)
{
{
tc_indices += ", ";
function_names += ", ";
ab_indices += ", ";
}
f_count++;
func_index, mi::neuraylib::ITarget_code::SL_PTX);
src += '\n';
}
}
src +=
std::string(
".visible .const .align 4 .u32 mdl_target_code_count = ")
print_array_u32(
src,
std::string(
"mdl_target_code_offsets"),
unsigned(target_codes.
size()), tc_offsets);
src +=
std::string(
".visible .const .align 4 .u32 mdl_functions_count = ")
print_array_func(src,
std::string(
"mdl_functions"), f_count, function_names);
print_array_u32(src,
std::string(
"mdl_arg_block_indices"), f_count, ab_indices);
print_array_u32(src,
std::string(
"mdl_target_code_indices"), f_count, tc_indices);
return src;
}
CUmodule build_linked_kernel(
const char *ptx_file,
const char *kernel_function_name,
CUfunction *out_kernel_function)
{
std::string ptx_func_array_src = generate_func_array_ptx(target_codes);
#ifdef DUMP_PTX
std::cout <<
"Dumping CUDA PTX code for the \"mdl_expr_functions\" array:\n\n"
#endif
CUlinkState cuda_link_state;
CUmodule cuda_module;
void *linked_cubin;
size_t linked_cubin_size;
char error_log[8192], info_log[8192];
CUjit_option options[4];
void *optionVals[4];
options[0] = CU_JIT_INFO_LOG_BUFFER;
optionVals[0] = info_log;
options[1] = CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES;
optionVals[1] = reinterpret_cast<void *>(uintptr_t(sizeof(info_log)));
options[2] = CU_JIT_ERROR_LOG_BUFFER;
optionVals[2] = error_log;
options[3] = CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES;
optionVals[3] = reinterpret_cast<void *>(uintptr_t(sizeof(error_log)));
check_cuda_success(cuLinkCreate(4, options, optionVals, &cuda_link_state));
CUresult link_result = CUDA_SUCCESS;
do {
for (
size_t i = 0, num_target_codes = target_codes.
size(); i < num_target_codes; ++i) {
link_result = cuLinkAddData(
cuda_link_state, CU_JIT_INPUT_PTX,
const_cast<char *>(target_codes[i]->get_code()),
target_codes[i]->get_code_size(),
nullptr, 0, nullptr, nullptr);
if (link_result != CUDA_SUCCESS) break;
}
if (link_result != CUDA_SUCCESS) break;
link_result = cuLinkAddData(
cuda_link_state, CU_JIT_INPUT_PTX,
const_cast<char *>(ptx_func_array_src.
c_str()),
ptx_func_array_src.
size(),
nullptr, 0, nullptr, nullptr);
if (link_result != CUDA_SUCCESS) break;
link_result = cuLinkAddFile(
cuda_link_state, CU_JIT_INPUT_PTX,
ptx_file, 0, nullptr, nullptr);
if (link_result != CUDA_SUCCESS) break;
link_result = cuLinkComplete(cuda_link_state, &linked_cubin, &linked_cubin_size);
} while (false);
if (link_result != CUDA_SUCCESS) {
check_cuda_success(link_result);
}
check_cuda_success(cuModuleLoadData(&cuda_module, linked_cubin));
check_cuda_success(cuModuleGetFunction(
out_kernel_function, cuda_module, kernel_function_name));
int regs = 0;
check_cuda_success(
cuFuncGetAttribute(®s, CU_FUNC_ATTRIBUTE_NUM_REGS, *out_kernel_function));
int lmem = 0;
check_cuda_success(
cuFuncGetAttribute(&lmem, CU_FUNC_ATTRIBUTE_LOCAL_SIZE_BYTES, *out_kernel_function));
check_cuda_success(cuLinkDestroy(cuda_link_state));
return cuda_module;
}
#endif // EXAMPLE_CUDA_SHARED_H
Source Code Location: examples/mdl_sdk/execution_cuda/example_execution_cuda.cu
#include <cuda_runtime.h>
#include <device_launch_parameters.h>
#include "texture_support_cuda.h"
#if defined(MI_NEURAYLIB_BSDF_USE_MATERIAL_IOR)
#define BSDF_USE_MATERIAL_IOR MI_NEURAYLIB_BSDF_USE_MATERIAL_IOR
using namespace mi::neuraylib;
#elif defined(MDL_CORE_BSDF_USE_MATERIAL_IOR)
#define BSDF_USE_MATERIAL_IOR MDL_CORE_BSDF_USE_MATERIAL_IOR
using namespace mi::mdl;
#endif
#ifdef ENABLE_DERIVATIVES
typedef Texture_handler_deriv Tex_handler;
#define TEX_VTABLE tex_deriv_vtable
#else
typedef Texture_handler Tex_handler;
#define TEX_VTABLE tex_vtable
#endif
struct Target_code_data
{
size_t num_textures;
Texture *textures;
char const *ro_data_segment;
};
extern __constant__ unsigned int mdl_functions_count;
extern __constant__ unsigned int mdl_arg_block_indices[];
extern __constant__ Mat_expr_func *mdl_functions[];
extern __constant__ unsigned int mdl_target_code_indices[];
{1.0f, 0.0f, 0.0f, 0.0f},
{0.0f, 1.0f, 0.0f, 0.0f},
{0.0f, 0.0f, 1.0f, 0.0f}
};
__device__ float radinv2(unsigned int bits)
{
bits = (bits << 16) | (bits >> 16);
bits = ((bits & 0x00ff00ff) << 8) | ((bits & 0xff00ff00) >> 8);
bits = ((bits & 0x0f0f0f0f) << 4) | ((bits & 0xf0f0f0f0) >> 4);
bits = ((bits & 0x33333333) << 2) | ((bits & 0xcccccccc) >> 2);
bits = ((bits & 0x55555555) << 1) | ((bits & 0xaaaaaaaa) >> 1);
return float(bits) / float(0x100000000ULL);
}
extern "C" __global__ void evaluate_mat_expr(
float3 *out_buf,
Target_code_data *tc_data_list,
char const **arg_block_list,
unsigned int width,
unsigned int height,
unsigned int num_samples)
{
unsigned int x = blockIdx.x * blockDim.x + threadIdx.x;
unsigned int y = blockIdx.y * blockDim.y + threadIdx.y;
if (x >= width || y >= height)
return;
float step_x = 1.f / width;
float step_y = 1.f / height;
float pos_x = 2.0f * x * step_x - 1.0f;
float pos_y = 2.0f * y * step_y - 1.0f;
float tex_x = float(x) * step_x;
float tex_y = float(y) * step_y;
unsigned int material_index =
((unsigned int)(tex_x * 4) ^ (unsigned int)(tex_y * 4)) % mdl_functions_count;
unsigned int tc_idx = mdl_target_code_indices[material_index];
char const *arg_block = arg_block_list[mdl_arg_block_indices[material_index]];
#ifdef ENABLE_DERIVATIVES
{ { tex_x, tex_y, 0.0f }, { step_x, 0.0f, 0.0f }, { 0.0f, step_y, 0.0f } } };
#else
tct_float3 texture_coords[1] = { { tex_x, tex_y, 0.0f } };
#endif
tct_float3 texture_tangent_u[1] = { { 1.0f, 0.0f, 0.0f } };
tct_float3 texture_tangent_v[1] = { { 0.0f, 1.0f, 0.0f } };
Mdl_state mdl_state = {
{ 0.0f, 0.0f, 1.0f },
{ 0.0f, 0.0f, 1.0f },
{ pos_x, pos_y, 0.0f },
0.0f,
texture_coords,
texture_tangent_u,
texture_tangent_v,
NULL,
tc_data_list[tc_idx].ro_data_segment,
identity,
identity,
0
};
Tex_handler tex_handler;
tex_handler.vtable = &TEX_VTABLE;
tex_handler.num_textures = tc_data_list[tc_idx].num_textures;
tex_handler.textures = tc_data_list[tc_idx].textures;
NULL, reinterpret_cast<Texture_handler_base *>(&tex_handler) };
float3 res = make_float3(0, 0, 0);
for (unsigned int i = 0; i < num_samples; ++i) {
float offs_x = float(i) / num_samples * step_x;
float offs_y = radinv2(i) * step_y;
mdl_state.position.x = pos_x + 2 * offs_x;
mdl_state.position.y = pos_y + 2 * offs_y;
#ifdef ENABLE_DERIVATIVES
texture_coords[0].val.x = tex_x + offs_x;
texture_coords[0].val.y = tex_y + offs_y;
#else
texture_coords[0].x = tex_x + offs_x;
texture_coords[0].y = tex_y + offs_y;
#endif
float3 cur_res;
mdl_functions[material_index](&cur_res, &mdl_state, &res_data_pair, NULL, arg_block);
res.x += cur_res.x;
res.y += cur_res.y;
res.z += cur_res.z;
}
res.x = powf(res.x / num_samples, 1.f / 2.2f);
res.y = powf(res.y / num_samples, 1.f / 2.2f);
res.z = powf(res.z / num_samples, 1.f / 2.2f);
out_buf[y * width + x] = res;
}