Get an NvSciSyncFence.
cudaSignalExternalSemaphoresAsync takes a valid NvSciSyncFence as input. Upon return, the fence tracks the completion of all work submitted to the same CUDA stream on which the API was invoked. Waiting on a fence is equivalent to waiting for the completion of all the work on the stream. This API ensures that when the dependent work (in the stream) completes, the NvSciSyncFence is signaled, and any potential waiters waiting on the NvSciSyncFence are unblocked. The signal happens asynchronously in the GPU (i.e., the calling thread returns immediately). Applications can also optionally set flag CUDA_EXTERNAL_SEMAPHORE_SIGNAL_SKIP_NVSCIBUF_MEMSYNC to indicate that memory synchronization operations are disabled over all CUDA-NvSciBufs imported into CUDA (in that process), which are normally performed by default to ensure data coherency with other importers of the same NvSciBuf memory objects. Use this flag when CUDA-NvSciSync is used to build only control-dependencies (i.e., no data sharing between the signaler and waiter).
            cudaWait|SignalExternalSemaphoresAsync API takes an array of cudaExternalSemaphore_t and cudaExternalSemaphoresWait|SignalParams. This allows the application to enqueue one or more external semaphore objects, each being one of the cudaExternalSemaphoreHandleType types. This option is an efficient way to describe a dependency between a CUDA stream and more than one NvSciSyncFence as a single operation.
            cudaSignalExternalSemaphoresAsync overwrites the previous contents of NvSciSyncFence passed to it.