NvSciStream Performance Test Application
NvSciStream provides a test application to measure KPIs when streaming buffers between a CPU producer and CPU consumers. This test focuses on NvSciStream performance, which does not use CUDA, NvMedia, or other hardware engines. To simplify measuring packet-delivery latency for each payload, the stream uses FIFO mode.
This test application is for performance testing purposes. It may simplify some setup steps and set unnecessary synchronization objects or fences to the CPU endpoints to include the fence transport latency in the measurement. To see how to create a stream with NvSciStream API, refer to NvSciStream Sample Application
This test uses the NvPlayFair library (see Benchmarking Library) to record timestamps, set the rate limit, save raw latency data, and calculate the latency statistics (such as the min, max, and mean value) on different platforms and operating systems.
The test app supports a variety of test cases:
- Single-process, inter-process and inter-chip streaming
- Unicast and multicast streaming
The test can set different stream configurations:
- Number of packets allocated in pool.
- Number of payloads transmitted between producer and consumers.
- Buffer size for each element.
- Number of synchronization objects used by each endpoint.
- Frame rate, frequency of the payloads presented by the producer.
The test measures several performance KPIs:
- Latency for each process:
- Total initialization time
- Stream setup time
- Streaming time
- Latency for each payload:
- Duration to wait for an available or ready packet
- End-to-end packet-delivery latency
- PCIe bandwidth in inter-chip stream
The README file in the test folder explains these KPIs with more details.
Prerequisites
The same prerequisites apply as with the NvSciStream Sample Application.
Building the NvSciStream Performance Test Application
The NvSciStream performance test includes source code, README, and a Makefile.
On the host system, navigate to the test directory:
cd <top>/drive-linux/samples/nvsci/nvscistream/perf_tests/
make clean
make
Running the NvSciStream Performance Test Application
Option | Meaning | Default |
---|---|---|
-h | Prints supported test options | |
-n <count> |
Specifies the number of consumers. Set in the producer process. |
1 |
-k <count> |
Specifies the number of packets in pool. Set in the producer process for primary pool. Set in the consumer process for c2c pool. |
1 |
-f <count> |
Specifies the number of payloads. Set in all processes. |
100 |
-b <size> | Specifies the buffer size (MB) per packet. | 1 |
-s <count> |
Specifies the number of sync objects per client. Set by each process. |
1 |
-r <count> | Specifies the producer frame-present rate (fps) | |
-l |
Measure latency. Set in all processes. |
False |
-v |
Save the latency raw data in csv file. Ignored if not measuring latency. |
False |
-a <target> |
Specifies the average KPI target (us) for packet-delivery latency. Compare the test result with the input target with 5% tolerance. Ignored if not measuring latency. |
|
-m <target> |
Specifies the 99.99 percentile KPI target (us) for packet-delivery latency. Compare the test result with the input target with 5% tolerance. Ignored if not measuring latency. |
|
For inter-process operation: | ||
-p | Inter-process producer. | |
-c <index> | Inter-process indexed consumer. | |
For inter-chip operations: | ||
-P <index> <Ipc endpoint> | Inter-SoC producer, NvSciIpc endpoint name connected to indexed consumer. | |
-C <index> <Ipc endpoint> | Inter-SoC consumer, NvSciIpc endpoint used by this indexed consumer. |
cp <top>/drive-
linux/samples/nvsci/nvscistream/perf_tests/test_nvscistream_perf
<top>/drive-linux/targetfs/home/nvidia/
Following are examples of running the performance test application with different configurations:
- Measure latency for single-process unicast stream with default
setup:
./test_nvscistream_perf -l
- Measure latency for single-process unicast stream with three packets in
pool:
./test_nvscistream_perf -l -k 3
- Measure latency for single-process multicast stream with two
consumers:
./test_nvscistream_perf -n 2 -l
- Measure latency for inter-process unicast stream with default
setup:
./test_nvscistream_perf -p -l &
./test_nvscistream_perf -c 0 -l
- Measure latency for inter-process unicast stream with a fixed producer-present rate at
100 fps, which transmits 10,000 payloads:
./test_nvscistream_perf -p -f 10000 -l -r 100 &
./test_nvscistream_perf -c 0 -f 10000 -l
- Measure latency and save raw latency data in nvscistream_*.csv file for inter-process
unicast stream, which transmits 10 payloads:
./test_nvscistream_perf -p -f 10 -l -v &
./test_nvscistream_perf -c 0 -f 10 -l -v
-
Measure PCIe bandwidth for the inter-chip unicast stream with 12.5 MB buffer size per packet, which transmits 10,000 frames. The two commands are run on different SoCs with <pcie_s0_1> <pcie_s1_1> PCIe channel:
On chip s0:
./test_nvscistream_perf -P 0 pcie_s0_1 -l -b 12.5 -f 10000
On chip s1:
./test_nvscistream_perf -C 0 pcie_s1_1 -l -b 12.5 -f 10000
The test_nvscistream_perf application must be run as root user (with sudo).
For the inter-process use case:
sudo rm -rf /dev/mqueue/*
sudo rm -rf /dev/shm/*
For the inter-chip use case:
Ensure different SoCs are set with different SoC IDs. For additional information, see the "Bind Options for SOC ID for C2C in GOS-DT" section in the AV PCT Configuration topic.