Python#
Python API Changes#
Allocating Buffers and Using a Name-Based Engine API
1def allocate_buffers(self, engine):
2'''
3Allocates all buffers required for an engine, i.e. host/device inputs/outputs.
4'''
5inputs = []
6outputs = []
7bindings = []
8stream = cuda.Stream()
9
10# binding is the name of input/output
11for binding in the engine:
12 size = trt.volume(engine.get_binding_shape(binding)) * engine.max_batch_size
13 dtype = trt.nptype(engine.get_binding_dtype(binding))
14
15 # Allocate host and device buffers
16 host_mem = cuda.pagelocked_empty(size, dtype) # page-locked memory buffer (won't be swapped to disk)
17 device_mem = cuda.mem_alloc(host_mem.nbytes)
18
19 # Append the device buffer address to device bindings.
20 # When cast to int, it's a linear index into the context's memory (like memory address).
21 bindings.append(int(device_mem))
22
23 # Append to the appropriate input/output list.
24 if engine.binding_is_input(binding):
25 inputs.append(self.HostDeviceMem(host_mem, device_mem))
26 else:
27 outputs.append(self.HostDeviceMem(host_mem, device_mem))
28
29return inputs, outputs, bindings, stream
1def allocate_buffers(self, engine):
2'''
3Allocates all buffers required for an engine, i.e. host/device inputs/outputs.
4'''
5inputs = []
6outputs = []
7bindings = []
8stream = cuda.Stream()
9
10for i in range(engine.num_io_tensors):
11 tensor_name = engine.get_tensor_name(i)
12 size = trt.volume(engine.get_tensor_shape(tensor_name))
13 dtype = trt.nptype(engine.get_tensor_dtype(tensor_name))
14
15 # Allocate host and device buffers
16 host_mem = cuda.pagelocked_empty(size, dtype) # page-locked memory buffer (won't be swapped to disk)
17 device_mem = cuda.mem_alloc(host_mem.nbytes)
18
19 # Append the device buffer address to device bindings.
20 # When cast to int, it's a linear index into the context's memory (like memory address).
21 bindings.append(int(device_mem))
22
23 # Append to the appropriate input/output list.
24 if engine.get_tensor_mode(tensor_name) == trt.TensorIOMode.INPUT:
25 inputs.append(self.HostDeviceMem(host_mem, device_mem))
26 else:
27 outputs.append(self.HostDeviceMem(host_mem, device_mem))
28
29return inputs, outputs, bindings, stream
Transition from enqueueV2
to enqueueV3
for Python
1# Allocate device memory for inputs.
2d_inputs = [cuda.mem_alloc(input_nbytes) for binding in range(input_num)]
3
4# Allocate device memory for outputs.
5h_output = cuda.pagelocked_empty(output_nbytes, dtype=np.float32)
6d_output = cuda.mem_alloc(h_output.nbytes)
7
8# Transfer data from host to device.
9cuda.memcpy_htod_async(d_inputs[0], input_a, stream)
10cuda.memcpy_htod_async(d_inputs[1], input_b, stream)
11cuda.memcpy_htod_async(d_inputs[2], input_c, stream)
12
13# Run inference
14context.execute_async_v2(bindings=[int(d_inp) for d_inp in d_inputs] + [int(d_output)], stream_handle=stream.handle)
15
16# Synchronize the stream
17stream.synchronize()
1# Allocate device memory for inputs.
2d_inputs = [cuda.mem_alloc(input_nbytes) for binding in range(input_num)]
3
4# Allocate device memory for outputs.
5h_output = cuda.pagelocked_empty(output_nbytes, dtype=np.float32)
6d_output = cuda.mem_alloc(h_output.nbytes)
7
8# Transfer data from host to device.
9cuda.memcpy_htod_async(d_inputs[0], input_a, stream)
10cuda.memcpy_htod_async(d_inputs[1], input_b, stream)
11cuda.memcpy_htod_async(d_inputs[2], input_c, stream)
12
13# Setup tensor address
14bindings = [int(d_inputs[i]) for i in range(3)] + [int(d_output)]
15
16for i in range(engine.num_io_tensors):
17 context.set_tensor_address(engine.get_tensor_name(i), bindings[i])
18
19# Run inference
20context.execute_async_v3(stream_handle=stream.handle)
21
22# Synchronize the stream
23stream.synchronize()
Engine Building, use only build_serialized_network
1engine_bytes = None
2try:
3 engine_bytes = self.builder.build_serialized_network(self.network, self.config)
4except AttributeError:
5 engine = self.builder.build_engine(self.network, self.config)
6 engine_bytes = engine.serialize()
7 del engine
8assert engine_bytes
1engine_bytes = self.builder.build_serialized_network(self.network, self.config)
2if engine_bytes is None:
3 log.error("Failed to create engine")
4 sys.exit(1)
Added Python APIs#
Types
APILanguage
ExecutionContextAllocationStrategy
IGpuAsyncAllocator
InterfaceInfo
IPluginResource
IPluginV3
IStreamReader
IVersionedInterface
Methods and Properties
ICudaEngine.is_debug_tensor()
ICudaEngine.minimum_weight_streaming_budget
ICudaEngine.streamable_weights_size
ICudaEngine.weight_streaming_budget
IExecutionContext.get_debug_listener()
IExecutionContext.get_debug_state()
IExecutionContext.set_all_tensors_debug_state()
IExecutionContext.set_debug_listener()
IExecutionContext.set_tensor_debug_state()
IExecutionContext.update_device_memory_size_for_shapes()
IGpuAllocator.allocate_async()
IGpuAllocator.deallocate_async()
INetworkDefinition.add_plugin_v3()
INetworkDefinition.is_debug_tensor()
INetworkDefinition.mark_debug()
INetworkDefinition.unmark_debug()
IPluginRegistry.acquire_plugin_resource()
IPluginRegistry.all_creators
IPluginRegistry.deregister_creator()
IPluginRegistry.get_creator()
IPluginRegistry.register_creator()
IPluginRegistry.release_plugin_resource()
Removed Python APIs#
The following Python APIs are listed next to their superseded API.
BuilderFlag.ENABLE_TACTIC_HEURISTIC
> Builder optimization level 2BuilderFlag.STRICT_TYPES
> Use all three flags:BuilderFlag.DIRECT_IO
,BuilderFlag.PREFER_PRECISION_CONSTRAINTS
,BuilderFlag.REJECT_EMPTY_ALGORITHMS
EngineCapability.DEFAULT
>EngineCapability.STANDARD
EngineCapability.kSAFE_DLA
>EngineCapability.DLA_STANDALONE
EngineCapability.SAFE_GPU
>EngineCapability.SAFETY
IAlgorithmIOInfo.tensor_format
> The strides, data type, and vectorization information are sufficient to identify tensor formats uniquely.IBuilder.max_batch_size
> Implicit batch support was removedIBuilderConfig.max_workspace_size
>IBuilderConfig.set_memory_pool_limit()
withMemoryPoolType.WORKSPACE
,IBuilderConfig.get_memory_pool_limit()
withMemoryPoolType.WORKSPACE
IBuilderConfig.min_timing_iterations
>IBuilderConfig.avg_timing_iterations
ICudaEngine.binding_is_input()
>ICudaEngine.get_tensor_mode()
ICudaEngine.get_binding_bytes_per_component()
>ICudaEngine.get_tensor_bytes_per_component()
ICudaEngine.get_binding_components_per_element()
>ICudaEngine.get_tensor_components_per_element()
ICudaEngine.get_binding_dtype()
>ICudaEngine.get_tensor_dtype()
ICudaEngine.get_binding_format()
>ICudaEngine.get_tensor_format()
ICudaEngine.get_binding_format_desc()
>ICudaEngine.get_tensor_format_desc()
ICudaEngine.get_binding_index()
> No name-based equivalent replacementICudaEngine.get_binding_name()
> No name-based equivalent replacementICudaEngine.get_binding_shape()
>ICudaEngine.get_tensor_shape()
ICudaEngine.get_binding_vectorized_dim()
>ICudaEngine.get_tensor_vectorized_dim()
ICudaEngine.get_location()
>ITensor.location
ICudaEngine.get_profile_shape()
>ICudaEngine.get_tensor_profile_shape()
ICudaEngine.get_profile_shape_input()
>ICudaEngine.get_tensor_profile_values()
ICudaEngine.has_implicit_batch_dimension()
> Implicit batch is no longer supportedICudaEngine.is_execution_binding()
> No name-based equivalent replacementICudaEngine.is_shape_binding()
>ICudaEngine.is_shape_inference_io()
ICudaEngine.max_batch_size()
> Implicit batch is no longer supportedICudaEngine.num_bindings()
>ICudaEngine.num_io_tensors()
IExecutionContext.get_binding_shape()
>IExecutionContext.get_tensor_shape()
IExecutionContext.get_strides()
>IExecutionContext.get_tensor_strides()
IExecutionContext.set_binding_shape()
>IExecutionContext.set_input_shape()
IFullyConnectedLayer
>IMatrixMultiplyLayer
INetworkDefinition.add_convolution()
>INetworkDefinition.add_convolution_nd()
INetworkDefinition.add_deconvolution()
>INetworkDefinition.add_deconvolution_nd()
INetworkDefinition.add_fully_connected()
>INetworkDefinition.add_matrix_multiply()
INetworkDefinition.add_padding()
>INetworkDefinition.add_padding_nd()
INetworkDefinition.add_pooling()
>INetworkDefinition.add_pooling_nd()
INetworkDefinition.add_rnn_v2()
>INetworkDefinition.add_loop()
INetworkDefinition.has_explicit_precision
> Explicit precision support was removed in 10.0INetworkDefinition.has_implicit_batch_dimension
> Implicit batch support was removedIRNNv2Layer
>ILoop
NetworkDefinitionCreationFlag.EXPLICIT_BATCH
> Support was removed in 10.0NetworkDefinitionCreationFlag.EXPLICIT_PRECISION
> Support was removed in 10.0PaddingMode.CAFFE_ROUND_DOWN
> Caffe support was removedPaddingMode.CAFFE_ROUND_UP
> Caffe support was removedPreviewFeature.DISABLE_EXTERNAL_TACTIC_SOURCES_FOR_CORE_0805
> External tactics are always disabled for core codePreviewFeature.FASTER_DYNAMIC_SHAPES_0805
> This flag is on by defaultProfilingVerbosity.DEFAULT
>ProfilingVerbosity.LAYER_NAMES_ONLY
ProfilingVerbosity.VERBOSE
>ProfilingVerbosity.DETAILED
ResizeMode
> UseInterpolationMode
. Alias was removed.SampleMode.DEFAULT
>SampleMode.STRICT_BOUNDS
SliceMode
> UseSampleMode
. Alias was removed.