Python#

Python API Changes#

Allocating Buffers and Using a Name-Based Engine API

TensorRT 8.x

def allocate_buffers(self, engine):
'''
Allocates all buffers required for an engine, i.e. host/device inputs/outputs.
'''
inputs = []
outputs = []
bindings = []
stream = cuda.Stream()

# binding is the name of input/output
for binding in the engine:
    size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
    dtype = trt.nptype(engine.get_binding_dtype(binding))

    # Allocate host and device buffers
    host_mem = cuda.pagelocked_empty(size, dtype) # page-locked memory buffer (won't be swapped to disk)
    device_mem = cuda.mem_alloc(host_mem.nbytes)

    # Append the device buffer address to device bindings.
    # When cast to int, it's a linear index into the context's memory (like memory address).
    bindings.append(int(device_mem))

    # Append to the appropriate input/output list.
    if engine.binding_is_input(binding):
        inputs.append(self.HostDeviceMem(host_mem, device_mem))
    else:
        outputs.append(self.HostDeviceMem(host_mem, device_mem))

return inputs, outputs, bindings, stream

TensorRT 10.x

def allocate_buffers(self, engine):
'''
Allocates all buffers required for an engine, i.e. host/device inputs/outputs.
'''
inputs = []
outputs = []
bindings = []
stream = cuda.Stream()

for i in range(engine.num_io_tensors):
    tensor_name = engine.get_tensor_name(i)
    size = trt.volume(engine.get_tensor_shape(tensor_name))
    dtype = trt.nptype(engine.get_tensor_dtype(tensor_name))

    # Allocate host and device buffers
    host_mem = cuda.pagelocked_empty(size, dtype) # page-locked memory buffer (won't be swapped to disk)
    device_mem = cuda.mem_alloc(host_mem.nbytes)

    # Append the device buffer address to device bindings.
    # When cast to int, it's a linear index into the context's memory (like memory address).
    bindings.append(int(device_mem))

    # Append to the appropriate input/output list.
    if engine.get_tensor_mode(tensor_name) == trt.TensorIOMode.INPUT:
        inputs.append(self.HostDeviceMem(host_mem, device_mem))
    else:
        outputs.append(self.HostDeviceMem(host_mem, device_mem))

return inputs, outputs, bindings, stream

Transition from enqueueV2 to enqueueV3 for Python

TensorRT 8.x

# Allocate device memory for inputs.
d_inputs = [cuda.mem_alloc(input_nbytes) for binding in range(input_num)]

# Allocate device memory for outputs.
h_output = cuda.pagelocked_empty(output_nbytes, dtype=np.float32)
d_output = cuda.mem_alloc(h_output.nbytes)

# Transfer data from host to device.
cuda.memcpy_htod_async(d_inputs[0], input_a, stream)
cuda.memcpy_htod_async(d_inputs[1], input_b, stream)
cuda.memcpy_htod_async(d_inputs[2], input_c, stream)

# Run inference
context.execute_async_v2(bindings=[int(d_inp) for d_inp in d_inputs] + [int(d_output)], stream_handle=stream.handle)

# Synchronize the stream
stream.synchronize()

TensorRT 10.x

# Allocate device memory for inputs.
d_inputs = [cuda.mem_alloc(input_nbytes) for binding in range(input_num)]

# Allocate device memory for outputs.
h_output = cuda.pagelocked_empty(output_nbytes, dtype=np.float32)
d_output = cuda.mem_alloc(h_output.nbytes)

# Transfer data from host to device.
cuda.memcpy_htod_async(d_inputs[0], input_a, stream)
cuda.memcpy_htod_async(d_inputs[1], input_b, stream)
cuda.memcpy_htod_async(d_inputs[2], input_c, stream)

# Setup tensor address
bindings = [int(d_inputs[i]) for i in range(3)] + [int(d_output)]

for i in range(engine.num_io_tensors):
    context.set_tensor_address(engine.get_tensor_name(i), bindings[i])

# Run inference
context.execute_async_v3(stream_handle=stream.handle)

# Synchronize the stream
stream.synchronize()

Engine Building, use only build_serialized_network

TensorRT 8.x

engine_bytes = None
try:
    engine_bytes = self.builder.build_serialized_network(self.network, self.config)
except AttributeError:
    engine = self.builder.build_engine(self.network, self.config)
    engine_bytes = engine.serialize()
    del engine
assert engine_bytes

TensorRT 10.x

engine_bytes = self.builder.build_serialized_network(self.network, self.config)
if engine_bytes is None:
    log.error("Failed to create engine")
    sys.exit(1)

Added Python APIs#

Types

APILanguage
ExecutionContextAllocationStrategy
IGpuAsyncAllocator
InterfaceInfo
IPluginResource
IPluginV3
IStreamReader
IVersionedInterface

Methods and Properties

ICudaEngine.is_debug_tensor()
ICudaEngine.minimum_weight_streaming_budget
ICudaEngine.streamable_weights_size
ICudaEngine.weight_streaming_budget
IExecutionContext.get_debug_listener()
IExecutionContext.get_debug_state()
IExecutionContext.set_all_tensors_debug_state()
IExecutionContext.set_debug_listener()
IExecutionContext.set_tensor_debug_state()
IExecutionContext.update_device_memory_size_for_shapes()
IGpuAllocator.allocate_async()
IGpuAllocator.deallocate_async()
INetworkDefinition.add_plugin_v3()
INetworkDefinition.is_debug_tensor()
INetworkDefinition.mark_debug()
INetworkDefinition.unmark_debug()
IPluginRegistry.acquire_plugin_resource()
IPluginRegistry.all_creators
IPluginRegistry.deregister_creator()
IPluginRegistry.get_creator()
IPluginRegistry.register_creator()
IPluginRegistry.release_plugin_resource()

Removed Python APIs#

The following Python APIs are listed next to their superseded API.

BuilderFlag.ENABLE_TACTIC_HEURISTIC > Builder optimization level 2
BuilderFlag.STRICT_TYPES > Use all three flags: BuilderFlag.DIRECT_IO, BuilderFlag.PREFER_PRECISION_CONSTRAINTS, BuilderFlag.REJECT_EMPTY_ALGORITHMS
EngineCapability.DEFAULT > EngineCapability.STANDARD
EngineCapability.kSAFE_DLA > EngineCapability.DLA_STANDALONE
EngineCapability.SAFE_GPU > EngineCapability.SAFETY
IAlgorithmIOInfo.tensor_format > The strides, data type, and vectorization information are sufficient to identify tensor formats uniquely.
IBuilder.max_batch_size > Implicit batch support was removed
IBuilderConfig.max_workspace_size > IBuilderConfig.set_memory_pool_limit() with MemoryPoolType.WORKSPACE, IBuilderConfig.get_memory_pool_limit() with MemoryPoolType.WORKSPACE
IBuilderConfig.min_timing_iterations > IBuilderConfig.avg_timing_iterations
ICudaEngine.binding_is_input() > ICudaEngine.get_tensor_mode()
ICudaEngine.get_binding_bytes_per_component() > ICudaEngine.get_tensor_bytes_per_component()
ICudaEngine.get_binding_components_per_element() > ICudaEngine.get_tensor_components_per_element()
ICudaEngine.get_binding_dtype() > ICudaEngine.get_tensor_dtype()
ICudaEngine.get_binding_format() > ICudaEngine.get_tensor_format()
ICudaEngine.get_binding_format_desc() > ICudaEngine.get_tensor_format_desc()
ICudaEngine.get_binding_index() > No name-based equivalent replacement
ICudaEngine.get_binding_name() > No name-based equivalent replacement
ICudaEngine.get_binding_shape() > ICudaEngine.get_tensor_shape()
ICudaEngine.get_binding_vectorized_dim() > ICudaEngine.get_tensor_vectorized_dim()
ICudaEngine.get_location() > ITensor.location
ICudaEngine.get_profile_shape() > ICudaEngine.get_tensor_profile_shape()
ICudaEngine.get_profile_shape_input() > ICudaEngine.get_tensor_profile_values()
ICudaEngine.has_implicit_batch_dimension() > Implicit batch is no longer supported
ICudaEngine.is_execution_binding() > No name-based equivalent replacement
ICudaEngine.is_shape_binding() > ICudaEngine.is_shape_inference_io()
ICudaEngine.max_batch_size() > Implicit batch is no longer supported
ICudaEngine.num_bindings() > ICudaEngine.num_io_tensors()
IExecutionContext.get_binding_shape() > IExecutionContext.get_tensor_shape()
IExecutionContext.get_strides() > IExecutionContext.get_tensor_strides()
IExecutionContext.set_binding_shape() > IExecutionContext.set_input_shape()
IFullyConnectedLayer > IMatrixMultiplyLayer
INetworkDefinition.add_convolution() > INetworkDefinition.add_convolution_nd()
INetworkDefinition.add_deconvolution() > INetworkDefinition.add_deconvolution_nd()
INetworkDefinition.add_fully_connected() > INetworkDefinition.add_matrix_multiply()
INetworkDefinition.add_padding() > INetworkDefinition.add_padding_nd()
INetworkDefinition.add_pooling() > INetworkDefinition.add_pooling_nd()
INetworkDefinition.add_rnn_v2() > INetworkDefinition.add_loop()
INetworkDefinition.has_explicit_precision > Explicit precision support was removed in 10.0
INetworkDefinition.has_implicit_batch_dimension > Implicit batch support was removed
IRNNv2Layer > ILoop
NetworkDefinitionCreationFlag.EXPLICIT_BATCH > Support was removed in 10.0
NetworkDefinitionCreationFlag.EXPLICIT_PRECISION > Support was removed in 10.0
PaddingMode.CAFFE_ROUND_DOWN > Caffe support was removed
PaddingMode.CAFFE_ROUND_UP > Caffe support was removed
PreviewFeature.DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805 > External tactics are always disabled for core code
PreviewFeature.FASTER_DYNAMIC_SHAPES_0805 > This flag is on by default
ProfilingVerbosity.DEFAULT > ProfilingVerbosity.LAYER_NAMES_ONLY
ProfilingVerbosity.VERBOSE > ProfilingVerbosity.DETAILED
ResizeMode > Use InterpolationMode. Alias was removed.
SampleMode.DEFAULT > SampleMode.STRICT_BOUNDS
SliceMode > Use SampleMode. Alias was removed.