NvSciStream Performance Test Application

NvSciStream provides a test application to measure KPIs when streaming buffers between a CPU producer and CPU consumers. This test focuses on NvSciStream performance, which does not use CUDA, NvMedia, or other hardware engines. To simplify measuring packet-delivery latency for each payload, the stream uses FIFO mode.

This test application is for performance testing purposes. It may simplify some setup steps and set unnecessary synchronization objects or fences to the CPU endpoints to include the fence transport latency in the measurement. To see how to create a stream with NvSciStream API, refer to NvSciStream Sample Application

This test uses the NvPlayFair library (see Benchmarking Library) to record timestamps, set the rate limit, save raw latency data, and calculate the latency statistics (such as the min, max, and mean value) on different platforms and operating systems.

The test app supports a variety of test cases:

  • Single-process, inter-process and inter-chip streaming
  • Unicast and multicast streaming

The test can set different stream configurations:

  • Number of packets allocated in pool.
  • Number of payloads transmitted between producer and consumers.
  • Buffer size for each element.
  • Number of synchronization objects used by each endpoint.
  • Frame rate, frequency of the payloads presented by the producer.

The test measures several performance KPIs:

  • Latency for each process:
    • Total initialization time
    • Stream setup time
    • Streaming time
  • Latency for each payload:
    • Duration to wait for an available or ready packet
    • End-to-end packet-delivery latency
  • PCIe bandwidth in inter-chip stream

The README file in the test folder explains these KPIs with more details.

Prerequisites

The same prerequisites apply as with the NvSciStream Sample Application.

Building the NvSciStream Performance Test Application

The NvSciStream performance test includes source code, README, and a Makefile.

On the host system, navigate to the test directory:

cd <top>/drive-linux/samples/nvsci/nvscistream/perf_tests/

Build the performance test application:
make clean
make

Running the NvSciStream Performance Test Application

Option Meaning Default
-h Prints supported test options
-n <count>

Specifies the number of consumers.

Set in the producer process.

1
-k <count>

Specifies the number of packets in pool.

Set in the producer process for primary pool.

Set in the consumer process for c2c pool.

1
-f <count>

Specifies the number of payloads.

Set in all processes.

100
-b <size> Specifies the buffer size (MB) per packet. 1
-s <count>

Specifies the number of sync objects per client.

Set by each process.

1
-r <count> Specifies the producer frame-present rate (fps)
-l

Measure latency.

Set in all processes.

False
-v

Save the latency raw data in csv file.

Ignored if not measuring latency.

False
-a <target>

Specifies the average KPI target (us) for packet-delivery latency. Compare the test result with the input target with 5% tolerance.

Ignored if not measuring latency.

-m <target>

Specifies the 99.99 percentile KPI target (us) for packet-delivery latency. Compare the test result with the input target with 5% tolerance.

Ignored if not measuring latency.

For inter-process operation:
-p Inter-process producer.
-c <index> Inter-process indexed consumer.
For inter-chip operations:
-P <index> <Ipc endpoint> Inter-SoC producer, NvSciIpc endpoint name connected to indexed consumer.
-C <index> <Ipc endpoint> Inter-SoC consumer, NvSciIpc endpoint used by this indexed consumer.
On Linux, copy the sample application to the target filesystem:
cp <top>/drive-
linux/samples/nvsci/nvscistream/perf_tests/test_nvscistream_perf 
<top>/drive-linux/targetfs/home/nvidia/

Following are examples of running the performance test application with different configurations:

  • Measure latency for single-process unicast stream with default setup:

    ./test_nvscistream_perf -l

  • Measure latency for single-process unicast stream with three packets in pool:

    ./test_nvscistream_perf -l -k 3

  • Measure latency for single-process multicast stream with two consumers:

    ./test_nvscistream_perf -n 2 -l

  • Measure latency for inter-process unicast stream with default setup:

    ./test_nvscistream_perf -p -l &

    ./test_nvscistream_perf -c 0 -l

  • Measure latency for inter-process unicast stream with a fixed producer-present rate at 100 fps, which transmits 10,000 payloads:

    ./test_nvscistream_perf -p -f 10000 -l -r 100 &

    ./test_nvscistream_perf -c 0 -f 10000 -l

  • Measure latency and save raw latency data in nvscistream_*.csv file for inter-process unicast stream, which transmits 10 payloads:

    ./test_nvscistream_perf -p -f 10 -l -v &

    ./test_nvscistream_perf -c 0 -f 10 -l -v

  • Measure PCIe bandwidth for the inter-chip unicast stream with 12.5 MB buffer size per packet, which transmits 10,000 frames. The two commands are run on different SoCs with <pcie_s0_1> <pcie_s1_1> PCIe channel:

    On chip s0:

    ./test_nvscistream_perf -P 0 pcie_s0_1 -l -b 12.5 -f 10000

    On chip s1:

    ./test_nvscistream_perf -C 0 pcie_s1_1 -l -b 12.5 -f 10000

Note:

The test_nvscistream_perf application must be run as root user (with sudo).

For the inter-process use case:

If it fails to open the IPC channel, cleaning up NvSciIpc resources may help.
sudo rm -rf /dev/mqueue/*
sudo rm -rf /dev/shm/*

For the inter-chip use case:

Ensure different SoCs are set with different SoC IDs. For additional information, see the "Bind Options for SOC ID for C2C in GOS-DT" section in the AV PCT Configuration topic.