NvSciStream Sample Application

The NVIDIA SDK provides a single sample application to demonstrate how to use the NvSciStream API to build simple and complex streams. This application combines all the features of the multiple separate samples provided in previous versions of the SDK and illustrates some new ones. This includes:

  • Both single- and multi-cast streaming
  • Both intra-process, inter-process, and inter-chip streaming, or a combination of the three when multicasting
  • Waiting for events on single blocks using the NvSciStream functions directly or on all blocks at once using an NvSciEventService
  • CUDA to CUDA streaming, available in both safety and non-safety builds, plus NvMedia to CUDA streaming, available in non-safety builds

Migration

Those familiar with the previously provided samples should be aware of several significant changes.

The complex C++ classes are eliminated, using flatter C code instead. The top-level setup to connect all the NvSciStream blocks can be found in the main.c file, while all other block operations are split into separate files for each block type. This provides a clearer example of the required function calls for each kind of block.

Instead of performing a specific sequence of operations and waiting for specific events to arrive at each block, this application supports a general event loop driven model. The recommended approach is to use NvSciEventNotifiers generated for each block, and a single main thread which can wait for events on all blocks simultaneously. When an event arrives on a block, it is directed to an appropriate block-specific function to handle it. Events not associated with NvSciStream can also be bound to NvSciEventNotifiers and be handled in the same loop. This makes for a more robust application design.

Those preferring to handle each NvSciStream block separately can still wait for events on individual blocks. The sample illustrates this approach as well.

Prerequisites

With inter-process streaming, the sample applications stream packets between a producer process and a consumer process via inter-process communication (NvSciIpc) channels. NvSciStream imposes minimum requirements on the NvSciIpc channels to transmit the required messages. The NvSciIpc channel must be created with at least 24K (24576) bytes per frame.

The NvSciIpc channels are configured via a plain text file, /etc/nvsciipc.cfg. For more information on NvSciIpc configuration data, see the NvSciIpc Configuration Data chapter. The recommended NvSciIpc channels for these sample applications are as follows:

INTER_PROCESS    nvscistream_0    nvscistream_1    16    24576
INTER_PROCESS    nvscistream_2    nvscistream_3    16    24576
INTER_PROCESS    nvscistream_4    nvscistream_5    16    24576
INTER_PROCESS    nvscistream_6    nvscistream_7    16    24576

Where inter-chip streaming is used, the sample application streams packets between different chips via NvSciIpc (INTER_CHIP, PCIe) channels. For more information, see Chip to Chip Communication.

Building the NvSciStream Event-Driven Sample Application

The NvSciStream sample includes source code and a Makefile.

  1. On the host system, navigate to the sample application directory:
    cd <top>/drive-linux/samples/nvsci/nvscistream/event/   
  2. Build the sample application:
    make clean 
    make 

Running the NvSciStream Event-Driven Sample Application

By default, the event-driven sample application will create a single-process unicast stream from CUDA to CUDA, using a mailbox queue for the consumer, and handling all events in a single loop.

This behavior can be modified with the following command line switches. If running in multiple processes, “-p” and “-c” must be specified for the producer and all consumers or the stream will not fully connect. Producer and consumers do not all need to reside in separate processes. They may be combined. Some options only affect the setup of the producer or one of the consumers. They are ignored if specified in the wrong process.

Option Meaning Default
-m <count>

Specifies the number of consumers.

Set in the producer process.

1
-f <count>

Specifies the number of packets to create.

Set in the producer process.

3
-l <index> <limit>

Add a limiter block between the producer and the indexed consumer, with the specified packet limit.

Set in the producer process.

-q <index> {f|m}

Use a fifo (f) or mailbox (m) queue for the indexed consumer.

Set in the consumer process.

f
-e {s|t}

Handle events through a single event service (s) or through separate per-thread event loops for each block (t).

Set in each process.

s
-u <index>

Use case index:

1: CUDA to CUDA

2: NvMedia to CUDA (non-safety builds)

Must be set the same in all processes.

1
-i

Optional. Set endpoint info by producer and consumers in this process and query info from other endpoints.

Set in each process.

For inter-process operation:
-p Producer resides in this process
-c <index> Indexed consumer resides in this process
For inter-chip operation:
-P <index> <Ipc endpoint> NvSciIpc endpoint used by this producer to communicate with the indexed consumer in another chip.
-C <index> <Ipc endpoint>

Indexed consumer resides in this process but in a different chip from the producer, and the NvSciIpc endpoint used by this consumer.

“-C” and “-c” can't be used simultaneously in one process.

-F <index> <count>

Specify the number of packets in the pool attached to the memory boundary IpcDst block of indexed C2C consumer.

Set in the consumer process.

3
-Q <index> {f|m}

Specify a fifo (f) or mailbox (m) queue attached to the memory boundry IpcSrc block associated with the indexed consumer.

Set in the producer process.

  1. Copy the sample application to the target filesystem:
    cp <top>/drive-linux/samples/nvsci/nvscistream/event/nvscistream_event_sample
            <top>/drive-linux/filesystem/targetfs/home/nvidia/ 
  2. The following are several examples of how to run the sample application with different configurations:
    • Single-process unicast with default setup:
      ./nvscistream_event_sample 
    • Single-process with three consumer multicast, NvMedia to CUDA streaming, and per-block event threads:
      ./nvscistream_event_sample –m 3 –u 2 –e t 
    • Two consumers, with one in the same processes as the producer and the other in a separate process. Both enable the endpoint info option:
      ./nvscistream_event_sample -m 2 -p -c 0 -i & 
      
      ./nvscistream_event_sample -c 1 -i &
      
    • Three consumers, with one in the same process as the producer and two in a separate process, and a mailbox queue for one of them:
      ./nvscistream_event_sample –m 3 –p –c 0 & 
      ./nvscistream_event_sample –c 1 –c 2 –q 2 m & 
    • Multi-process cuda/cuda stream with one consumer on another SoC. A FIFO queue is attached to the memory boundary IpcSrc block, and a 3-packet pool is attached to the memory boundary IpcDst block. It uses the NvSciIpc channel <pcie_s0_1> <pcie_s1_1>.

      On chip s0:
      ./nvscistream_event_sample -P 0 pcie_s0_1 -Q 0 f

      On chip s1:

      ./nvscistream_event_sample -C 0 pcie_s1_1 -F 0 3
    • Four consumers, with one in the same process as the producer, one in another process but on the same chip as the producer, and two in another process on another chip.

      Both the 3rd and 4th consumers have mailbox queue attached to the memory boundary IpcSrc block and 5-packet pool attched to the memory boundary IpcDst block.

      Inter-chip NvSciIpc channels used by the 3rd and 4th consumers:

      • <pcie_s0_1> <pcie_s1_1>
      • <pcie_s0_2> <pcie_s1_2>

      On chip s0:

      ./nvscistream_event_sample -m 4 -c 0 -q 0 m -Q 2 m -Q 3 m -P 2 pcie_s0_1 -P 3 pcie_s0_2 &
      ./nvscistream_event_sample -c 1 -q 1 m &
      

      On chip s1:

      ./nvscistream_event_sample -C 2 pcie_s1_1 -q 2 f -F 2 5 -C 3 pcie_s1_2 -q 3 m -F 3 5
Note:

The nvscistream_event_sample application must be run as root user (with sudo).

If the nvscistream_event_sample application fails to open the IPC channel, cleaning up NvSciIpc resources may help.

sudo rm -rf /dev/mqueue/*
sudo rm -rf /dev/shm/*

For inter-chip use cases:

Ensure different SoCs are set with different Soc IDs. See the "Bind Options for SOC ID for C2C in GOS-DT" section in the AV PCT Configuration topic.