Chip to Chip Communication#

This topic applies only to Linux builds running on Thor platforms.

The NVIDIA Software Communication Interface for Chip to Chip over direct PCIe connection (NvSciC2cPcie) provides the ability for user applications to exchange data across two NVIDIA DRIVE AGX™ DevKits interconnected on a direct PCIe connection. The direct PCIe connection is between the first NVIDIA DRIVE AGX DevKit as a PCIe Root Port with the second NVIDIA DRIVE AGX DevKit as a PCIe Endpoint.

Supported Platform Configurations#

Platform

  • NVIDIA DRIVE AGX Thor DevKit

SoC

  • NVIDIA DRIVE AGX Thor as PCIe Root Port

  • NVIDIA DRIVE AGX Thor as PCIe Endpoint

Topology

  • NVIDIA DRIVE AGX DevKit as PCIe Root Port <> NVIDIA DRIVE AGX DevKit as PCIe Endpoint

../../../_images/C2C_PCIe_miniSAS_connection.png

Platform Setup#

The following platform configurations are required for NvSciC2cPcie communication with NVIDIA DRIVE AGX DevKit. A similar connection is required for other platforms.

  • miniSAS Port-A of NVIDIA DRIVE AGX DevKit-1 to connect to miniSAS Port-A of NVIDIA DRIVE AGX DevKit-2 with a PCIe miniSAS cable.

  • For custom platform PCIe controllers, lane and clock should be configured accordingly.

  • Each PCIe controller in NVIDIA DRIVE AGX DevKits has the PCIe eDMA engine. NvSciC2cPcie only uses one DMA Write channel of the assigned PCIe controller for all the NvSciC2cPcie transfers. - NvSciC2cPcie transfers are in the FIFO mechanism and there is no load balancing or scheduling policy to prioritize the specific request.

Example:

For NVIDIA DRIVE AGX DevKit as PCIe Root Port

${NV_WORKSPACE}/drive-foundation/make/bind_partitions -b <board_name> linux ENABLE_T264_PCIE_C5=y

For NVIDIA DRIVE AGX DevKit as PCIe Endpoint

${NV_WORKSPACE}/drive-foundation/make/bind_partitions -b <board_name> linux ENABLE_T264_PCIE_C5_EP=y

Refer to DRIVE Linux on Thor Silicon for the flashing process.

Execution Setup#

Linux Kernel Module Insertion#

Before user applications can exercise the NvSciC2cPcie interface, insert the Linux kernel modules for NvSciC2cPcie. They are not loaded by default on NVIDIA DriveOS™ Linux boot. To insert the required Linux kernel module:

On first/one SoC configured as PCIe Root Port

sudo modprobe nvscic2c-pcie-epc

On second/other SoC configured as PCIe Endpoint

sudo modprobe nvscic2c-pcie-epf

A recommendation is to load nvscic2c-pcie-ep* kernel modules immediately after boot. This allows the NvSciC2cPcie software stack to allocate contiguous physical pages for its internal operation for each of the NvSciC2cPcie endpoints configured.

PCIe Hot-Plug#

Once loaded, DevKit enabled as PCIe Endpoint is hot-plugged and enumerated as a PCIe device with DevKit configured as PCIe Root Port. The following must be executed on DevKit configured as PCIe Endpoint:

sudo -s
CTL_BASE=a808480000
cd /sys/kernel/config/pci_ep/
mkdir functions/nvscic2c_epf_22CC/func
echo 0x10DE > functions/nvscic2c_epf_22CC/func/vendorid
echo 0x22CC > functions/nvscic2c_epf_22CC/func/deviceid

echo 0x10DE > functions/nvscic2c_epf_22CC/func/subsys_vendor_id
echo 16 > functions/nvscic2c_epf_22CC/func/msi_interrupts

ln -s functions/nvscic2c_epf_22CC/func controllers/$CTL_BASE.pcie_ep
echo 0 > controllers/$CTL_BASE.pcie_ep/start
echo 1 > controllers/$CTL_BASE.pcie_ep/start

The previous steps, including Linux kernel module insertion, can be added as a linux systemd service to facilitate auto-availability of NvSciC2cPcie software at boot.

NvSciIpc (INTER_CHIP, PCIe) Channels

When the PCIe connection completes, the following NvSciIpc channels are available for use with NvStreams producer or consumer applications.

Channel Name on DevKit as PCIe Root Port

Channel Name on DevKit as PCIe Endpoint

nvscic2c_pcie_s0_c5_1

nvscic2c_pcie_s1_c5_1

nvscic2c_pcie_s0_c5_2

nvscic2c_pcie_s1_c5_2

nvscic2c_pcie_s0_c5_3

nvscic2c_pcie_s1_c5_3

nvscic2c_pcie_s0_c5_4

nvscic2c_pcie_s1_c5_4

nvscic2c_pcie_s0_c5_5

nvscic2c_pcie_s1_c5_5

nvscic2c_pcie_s0_c5_6

nvscic2c_pcie_s1_c5_6

nvscic2c_pcie_s0_c5_7

nvscic2c_pcie_s1_c5_7

nvscic2c_pcie_s0_c5_8

nvscic2c_pcie_s1_c5_8

nvscic2c_pcie_s0_c5_9

nvscic2c_pcie_s1_c5_9

nvscic2c_pcie_s0_c5_10

nvscic2c_pcie_s1_c5_10

nvscic2c_pcie_s0_c5_11

nvscic2c_pcie_s1_c5_11

nvscic2c_pcie_s0_c5_12

nvscic2c_pcie_s1_c5_12

  • nvscic2c_pcie_s0_c5_12 and nvscic2c_pcie_s1_c5_12 are not available for use with NvStreams over Chip to Chip connection but only for NvSciIpc over Chip to Chip (INTER_CHIP, PCIe) channels for short and less-frequent generic-purpose data. Refer to NvSciIpc API Usage.

The user application on NVIDIA DRIVE AGX DevKit as PCIe Root Port uses/opens nvscic2c_pcie_s0_c5_1 for use, then peer user application on the other NVIDIA DRIVE AGX DevKit as PCIe Endpoint must use/open nvscic2c_pcie_s1_c5_1 for exchange of data across the SoCs and likewise, for the remaining channels listed previously.

Each of the NvSciIpc (INTER_CHIP, PCIe) channels are configured to have 16 frames with 32 KB per frame as default.

Reconfiguration#

The following reconfiguration information is based on the default NvSciC2cPcie support offered for NVIDIA DRIVE AGX DevKit.

Different platforms or a different PCIe controller configuration on the same NVIDIA DRIVE AGX DevKit requires adding a new set of device-tree node entries for NvSciC2cPcie on a PCIe Root Port (nvidia,tegra-nvscic2c-pcie-epc) and a PCIe Endpoint (nvidia,tegra-nvscic2c-pcie-epf). For example, a change in PCIe Controller Id or a change in the role of a PCIe controller from PCIe Root Port to PCIe Endpoint (or vice versa) from the default NVIDIA DRIVE AGX Orin DevKit requires changes. The changes are possible with device-tree node changes or additions, but it is not straightforward to document them all. These are one-time changes and can occur in coordination with your NVIDIA point-of-contact.

BAR Size#

BAR size for NVIDIA DRIVE AGX as PCIe Endpoint is configured to 1 GB (one gigabyte) by default. When required, this can be reduced or increased by modifying the property: nvidia,bar-win-size of device-tree node: nvscic2c-pcie-s1-c5-epf.

File:

${NV_WORKSPACE}/drive-linux/kernel/source/hardware/nvidia/platform/t264/automotive/kernel-dts/p3960/common/tegra264-p3960-nvscic2c-pcie.dtsi

Example:

nvscic2c-pcie-s1-c5-epf {
    compatible = "nvidia,tegra-nvscic2c-pcie-epf";
--   nvidia,bar-win-size = <0x40000000>;  /* 1GB. */
++ nvidia,bar-win-size = <0x20000000>;  /* 512MB. */
};

The configured BAR size must be a power-of 2 and a minimum of 64 MB. Maximum BAR depends on the size of pre-fetchable memory supported by PCIe RP.

NvSciIpc (INTER_CHIP, PCIe) Channel Properties#

The NvSciIpc (INTER_CHIP, PCIe) channel properties can be modified on a use-case basis.

Modify Channel Properties

Changing channel properties frames and frame size, must occur for both NVIDIA DRIVE AGX DevKit as PCIe Root Port and NVIDIA DRIVE AGX DevKit as PCIe Endpoint device-tree nodes: nvscic2c-pcie-s0-c5-epc and nvscic2c-pcie-s1-c5-epf, respectively.

File: ${NV_WORKSPACE}/drive-linux/kernel/source/hardware/nvidia/platform/t264/automotive/kernel-dts/p3960/common/tegra264-p3960-nvscic2c-pcie.dtsi

The following example shows changes in frame count or number of frames for NvSciIpc (INTER_CHIP, PCIe) channel: nvscic2c_pcie_s0_c5_2 (PCIe Root Port) and nvscic2c_pcie_s1_c5_2 (PCIe Endpoint)

nvscic2c-pcie-s0-c5-epc {
nvidia,endpoint-db =
"nvscic2c_pcie_s0_c5_1,     16,     00032768,    67108864,    26001",
--   "nvscic2c_pcie_s0_c5_2,     16,     00032768,    67108864,    26002",
++ "nvscic2c_pcie_s0_c5_2,     08,     00032768,    67108864,    26002",
"nvscic2c_pcie_s0_c5_3,     16,     00032768,    67108864,    26003",
…..
};
nvscic2c-pcie-s1-c5-epf {
nvidia,endpoint-db =
"nvscic2c_pcie_s1_c5_1,     16,     00032768,    67108864,    26101",
--   "nvscic2c_pcie_s1_c5_2,     16,     00032768,    67108864,    26102",
++ "nvscic2c_pcie_s1_c5_2,     08,     00032768,    67108864,    26102",
"nvscic2c_pcie_s1_c5_3,     16,     00032768,    67108864,    26103",
…..
};

The following example shows changes in frame size for NvSciIpc (INTER_CHIP, PCIe) channel: nvscic2c_pcie_s0_c5_2 (PCIe Root Port) and nvscic2c_pcie_s1_c5_2(PCIe Endpoint)

nvscic2c-pcie-s0-c5-epc {
nvidia,endpoint-db =
"nvscic2c_pcie_s0_c5_1,     16,     00032768,    67108864,    26001",
--   "nvscic2c_pcie_s0_c5_2,     16,     00032768,    67108864,    26002",
++ "nvscic2c_pcie_s0_c5_2,     16,     00028672,    67108864,    26002",
"nvscic2c_pcie_s0_c5_3,     16,     00032768,    67108864,    26003",
…..
};
nvscic2c-pcie-s1-c5-epf {
nvidia,endpoint-db =
"nvscic2c_pcie_s1_c5_1,     16,     00032768,    67108864,    26101",
--   "nvscic2c_pcie_s1_c5_2,     16,     00032768,    67108864,    26102",
++ "nvscic2c_pcie_s1_c5_2,     16,     00028672,    67108864,    26102",
"nvscic2c_pcie_s1_c5_3,     16,     00032768,    67108864,    26103",
…..
};

The channel properties have these limits:

  • Frame count: minimum: 1, maximum: 64

  • Frame size: minimum: 64B, maximum: 32 KB. Must be aligned to 64 B.

  • In Streaming mode total size of buffers can be mapped per endpoint is 64MB (67108864 Bytes). 64MB is derived based on (BAR_SIZE – PROTOCAL_OVERHEAD), which is equally divided between max number of streaming endpoints (15). If required, user can update the limit in DT with the capping of total as (BAR_SIZE - PROTOCAL_OVERHEAD) for all the endpoints combined. Here, PROTOCAL_OVERHEAD = 54MB.

New Channel Addition

To introduce additional NvSciIpc (INTER_CHIP, PCIe) channels, the change must occur for both NVIDIA DRIVE AGX DevKit as PCIe Root Port and NVIDIA DRIVE AGX DevKit as PCIe Endpoint device-tree nodes: nvscic2c-pcie-s0-c5-epc and nvscic2c-pcie-s1-c5-epf, respectively.

File: ${NV_WORKSPACE}/drive-linux/kernel/source/hardware/nvidia/platform/t264/automotive/kernel-dts/p3960/common/tegra264-p3960-nvscic2c-pcie.dtsi

Example:

nvscic2c-pcie-s0-c5-epc {
nvidia,endpoint-db =
"nvscic2c_pcie_s0_c5_1,     16,     00032768,    67108864,    26001",
……
--  "nvscic2c_pcie_s0_c5_12,     16,     00000064,    0,    26012";
++ "nvscic2c_pcie_s0_c5_12,     16,     00000064,    0,    26012",
++ "nvscic2c_pcie_s0_c5_13,     16,     00032768,    67108864,    26013";
};
nvscic2c-pcie-s1-c5-epf {
nvidia,endpoint-db =
"nvscic2c_pcie_s1_c5_1,     16,     00032768,    67108864,    26101",
……
--  "nvscic2c_pcie_s1_c5_12,     16,     00000064,    0,    26112";
++ "nvscic2c_pcie_s1_c5_12,     16,     00000064,    0,    26112",
++ "nvscic2c_pcie_s1_c5_13,     16,     00032768,    67108864,    26113";
};

File: /etc/nvsciipc.cfg(on target)

INTER_CHIP_PCIE      nvscic2c_pcie_s0_c5_12   26012
    ++ INTER_CHIP_PCIE      nvscic2c_pcie_s0_c5_13   26013
    …..
    …..
    …..
    INTER_CHIP_PCIE      nvscic2c_pcie_s1_c5_12   26112
    ++ INTER_CHIP_PCIE      nvscic2c_pcie_s1_c5_13   26113

Changes can be made to reduce, subtract, or remove existing NvSciIpc (INTER_CHIP, PCIe) channels.

For a given pair of NVIDIA DRIVE AGX DevKit as PCIe Root Port and NVIDIA DRIVE AGX DevKit as PCIe Endpoint, the maximum NvSciIpc (INTER_CHIP, PCIe) channels supported are 16.

Similarly for other platforms, corresponding *dtsi files require modification.

PCIe Hot-Unplug#

To tear down the connection between PCIe Root Port and PCIe Endpoint, PCIe hot-unplug PCIe Endpoint from PCIe Root Port. Refer to the Restrictions section for more information.

The PCIe Hot-Unplug is always executed from PCIe Endpoint [NVIDIA DRIVE AGX DevKit] by initiating the power-down off the PCIe Endpoint controller and subsequently unbinding the nvscic2c-pcie-epf module with the PCIe Endpoint.

Prerequisite: PCIe Hot-Unplug must be attempted only when the PCIe Endpoint is successfully hot-plugged into PCIe Root Port and NvSciIpc(INTER_CHIP, PCIE) channels are enumerated.

To PCIe hot-unplug, execute the following on NVIDIA DRIVE AGX DevKit configured as PCIe Endpoint. This makes NvSciIpc(INTER_CHIP, PCIE) channels disappear on both the PCIe inter-connected NVIDIA DRIVE AGX DevKits.

sudo -s
CTL_BASE=a808480000
cd /sys/kernel/config/pci_ep/

Check NvSciC2cPcie device nodes are available:

ls /dev/nvscic2c_*

The previous command should list NvSciC2cPcie device nodes for the corresponding PCIe Root Port and PCIe Endpoint connection. Continue with the following set of commands:

echo 0 > controllers/$CTL_BASE.pcie-ep/start

Wait until NvSciC2cPcie device nodes disappear. The following command can be used to check for NvSciC2cPcie device nodes availability:

ls /dev/nvscic2c_*

Once the device nodes disappear, execute:

unlink controllers/$CTL_BASE.pcie-ep/func

Successful PCIe hot-unplug of PCIe Endpoint from PCIe Root Port makes the NvSciIpc(INTER_CHIP, PCIE) channels as listed, go away on both the NVIDIA DRIVE AGX DevKits, and you can proceed with power-cycle/off of one or both the NVIDIA DRIVE AGX DevKits.

PCIe Hot-Replug#

To re-establish the PCIe connection between PCIe Endpoint and PCIe Root Port, you must PCIe Hot-Replug PCIe Endpoint to PCIe Root Port.

When both the SoCs are power-cycled after PCIe Hot-Unplug previously, you must follow the usual steps of PCIe Hot-Plug. However, if one of the two SoCs power-cycled/rebooted, then PCIe Hot-Replug is required to re-establish the connection between them.

Prerequisite: PCIe Hot-Replug is attempted when one of the two SoCs is power-recycled/rebooted after a successful attempt of PCIe Hot-Unplug between them. If both SoCs were power-recycled/rebooted, then the same steps as listed in the Execution Setup section are required to establish the PCIe connection between them.

When only PCIe Root Port SoC was power-recycled/rebooted#

On PCIe Root Port SoC (NVIDIA DRIVE AGX DevKit)

Follow the same steps as listed in Linux Kernel Module Insertion.

On PCIe Endpoint SoC [NVIDIA DRIVE AGX Orin DevKit]

sudo -s
CTL_BASE=a808480000
cd /sys/kernel/config/pci_ep/
ln -s functions/nvscic2c_epf_22CC/func controllers/$CTL_BASE.pcie_ep
echo 0 > controllers/$CTL_BASE.pcie_ep/start
echo 1 > controllers/$CTL_BASE.pcie_ep/start

When only PCIe Endpoint SoC is power-recycled/rebooted#

On PCIe Endpoint SoC (NVIDIA DRIVE AGX DevKit)

Follow the steps Execution Setup.

On PCIe Root Port SoC (NVIDIA DRIVE AGX DevKit)

Nothing is required. The module is already inserted.

SoC Error#

The only scenario for SoC Error is when one or both of the PCIe Root Port SoC and PCIe Endpoint SoC connected with NvSciC2cPcie has linux-kernel oops/panic. The application might observe timeouts.

Reconnection#

On the SoC that is still functional and responsive, user must follow the same restrictions for PCIe Hot-Unplug. On the same SoC, once the applications exit or pipeline is purged, user must recover the faulty SoC either by rebooting or resetting it.

Only then, subsequently:

  • If the functional SoC (non-faulty) was PCIe Endpoint SoC, then same steps as for PCIe Hot-Unplug and PCIe Hot-Replug listed above on PCIe Endpoint SoC and on the recovered SoC (PCIe Root Port SoC) user must do the same steps as listed in sub-section Linux Kernel Module Insertion.

  • If the functional SoC (non-faulty) was PCIe Root Port SoC, then nothing is to be done on that SoC, but on the recovered SoC (PCIe Endpoint SoC) user must do the same steps as listed in the PCIe Hot-Plug section.

  • If both the SoC’s were faulty, then on recovering each of the two SoCs, it becomes the usual case of Linux Kernel Module Insertion and PCIe Hot-Plug as done to establish the PCIe connection between them initially.

Successful Error Recovery and PCIe reconnection makes the Channels reappear/available again for use.

PCIe Error#

Case 2: PCIe EDMA Transfer Errors#

PCIe EDMA transfer errors can lead to data loss.

Recovery: PCIe EDMA engine is sanitized once all pipelined tranfers are returned. Recovery from PCIe EDMA errors is not guaranteed and therefore it is recommended to retry streaming and if error persist PCIe link recovery would be required. For PCIe link recovery, PCIe Hot-Unplug and then PCIe Hot-Replug is required, on PCIe Endpoint SoC.

SC-7 Suspend and Resume Cycle#

Follow the same set of restrictions and assumptions for SC-7 suspend and resume cycle as listed in the PCIe Hot-Unplug and PCIe Hot-Replug. Before one or both the two interconnected SoCs enter SC-7 suspend, PCIe Hot-Unplug must be carried out keeping the set of restrictions applicable for PCIe Hot-Unplug. Once one or both the two interconnected SoCs exit from SC-7 suspend, such as SC-7 resume, the same steps as listed in PCIe Hot-Replug are required.

Assumptions#

  • NVIDIA Software Communication Interface for Chip to Chip (NvSciC2cPcie) is offered only between the inter-connected NVIDIA DRIVE SoC as PCIe Root Port and a NVIDIA DRIVE SoC as PCIe Endpoint. Producer buffers are copied onto remote consumer buffers pinned to PCIe memory using the PCIe eDMA engine.

  • NVIDIA Software Communication Interface for Chip to Chip (NvSciC2cPcie) is offered from a single Guest OS Virtual Machine of a NVIDIA DRIVE SoC as PCIe Root Port to/from a single Guest OS Virtual Machine of another NVIDIA DRIVE SoC as PCIe Endpoint.

  • User-applications are responsible for the steps to teardown the ongoing Chip to Chip transfer pipeline on all the SoCs in synergy and gracefully.

  • Out of the box support is ensured for NVIDIA DRIVE AGX DevKit inter-connected with another NVIDIA DRIVE AGX DevKit.

    • In this configuration, the default configuration is PCIe controller C5 in PCIe Root Port mode and PCIe controller C5 in PCIe Endpoint mode. Any change in PCIe controller mode or by moving to another set of PCIe controllers for NvSciC2cPcie requires changes in the tegra264-p3960-nvscic2c-pcie.dtsi device-tree include file.

Restrictions#

  • Before powering-off/recycling one of the two PCIe inter-connected NVIDIA DRIVE AGX DevKits when one NVIDIA DRIVE AGX DevKit is PCIe hot-plugged into another NVIDIA DRIVE AGX DevKit, you must tear down the PCIe connection between them (PCIe Hot-Unplug).

  • Before tearing down the PCIe connection between the two SoCs (PCIe hot-unplug), on both of these SoCs, all applications or streaming pipelines using the corresponding NvSciIpc(INTER_CHIP, PCIE) channels will exit or purge. Before they exit or purge, the corresponding in-use NvSciIpc(INTER_CHIP, PCIE) channel must be closed with NvSciIpcCloseEndpointSafe().

  • On the two PCIe inter-connected NVIDIA DRIVE AGX DevKits, before closing a corresponding NvSciIpc(INTER_CHIP, PCIE) channel with NvSciIpcCloseEndpointSafe(), you must ensure for this NvSciIpc(INTER_CHIP, PCIE) channel: - No pipelined NvSciSync waits are pending. - All the NvSciIpc (INTER_CHIP, PCIE) channel messages sent have been received. - All the NvSciBuf and NvSciSync, source and target handles, export and import handles, registered and CPU mapped, with NvSciC2cPcie layer must be unregistered and their mapping deleted with NvSciC2cPcie layer by invoking the relevant NvSciC2cPcie programming interfaces.

  • Unloading of NvSciC2cPcie Linux kernel modules is not supported.

  • Error-handling of NvSciC2cPcie transfers other than PCIe AER and PCIe EDMA transfer error, leads to timeouts in the software layers exercising NvSciC2cPcie.

  • Chip to Chip (NvSciC2cPcie) communication accepts a maximum of 1022 NvSciBufObects and 1022 NvSciSyncObjects for NvStreams over Chip to Chip communication permitting system limits.