MCU_FOH Integration Steps#

Failover Handler on MCU#

Functionalities#

  • Monitors the SOC_ERROR pin (operated in Toggle mode) used to assert NVIDIA DRIVE AGX Thor™ Safety Status and reports any detected error to Error Handler module.

  • Communicates with FSI on NVIDIA DRIVE AGX Thor over SPI 2 as initiator to get NVIDIA DRIVE AGX Thor failure information.

  • Notifies the MCU Error Handler module to initiate a failover when a critical error is reported by NVIDIA DRIVE AGX Thor FSI.

  • Communicates the error status received in the periodic SPI frames to customer application for persistent storage.

  • Provides an interface to notify periodic failure status received from System Error Handler of Thor-FSI.

  • Provides an interface to notify NVIDIA DRIVE AGX Thor SoC error pin status.

  • Provides an interface to the MCU application to send any data to FSI. The MCU application can choose to fetch data from persistent memory of MCU and send it over this interface.

  • Handles the following hardware error scenarios:

    • SOC_ERROR stuck at in Toggle mode

    • SPI Channel errors

Dependencies#

  • Uses the hardware modules GPIO and SPI on MCU. The drivers for these modules should be provided by the AUTOSAR stack vendor or any third-party vendors.

  • SPI configuration - In CPU mode is used in Thor.

The communication bus between MCU and FSI for exchanging Safety services communication

Baud rate

QSPI hardware unit

CS line

CS polarity

Data shift edge

Data width

Transfer start

5000000

0

2

Low

Leading

32 bits

LS bit

Design Aspects#

  • For MCU FOH, an AUTOSAR SWC of type CDD is modeled.

image1

MCU FOH and FSI FOH Communication#

image2

SPI communication between FSI FOH and MCU FOH is used by the following:

  • FSI SEH to send a critical failure report to MCU application software to initiate a failover and safe state.

  • FSI SEH to send periodic failure status to MCU application software for persistent storage.

  • MCU application software to send the data to FSI SEH.

Runnables#

  • MCU FOH has Init, Periodic, and Server runnables.

  1. No.

Runnable

Type

Interface

Re-entrant

1

McuFoh_Init

Init runnable

No

2

McuFoh_ErrPinMonitor

Periodic (1 ms)

No

3

McuFoh_TransmitSpiPkt

Periodic (10 ms)

No

4

McuFoh_SOC_Response

Server runnable

R_SpiRxProcess

No

5

McuFoh_StartMonitoring

Event-driven runnable

P_StartMontitoring

No

Interfaces#

  • P_ReportCriticalFailure (Sender-Receiver interface, Mcu_Foh: Sender)

API

Argument

Type

Range/Enum/Value

Note for Integrator

Rte_Send_P_ ReportCriticalFailure_ FailureReport

FailureReport

SS_ NvSehCriticalFailure_t

ErrorReportFrame

  1. ErrorCode-Unique error code to identify the NVIDIA Thor error per Error Id spec.

  2. Error Attribute-Additional information associated to error code defined by clients.

  3. Timestamp-Lower 32 bits of NVIDIA Thor TSC time at which the error was detected.

  4. ReporterId-Unique identifier for reporter of the error.

SystemFailureId-NVIDIA Thor failure ID.

MaturationState-State of maturation of the NVIDIA Thor failure ID.

Application like MCU Error Handler can connect to this interface to receive failure reports, which are sent from NVIDIA Thor FSI.

Notify NVIDIA Thor Failure reports to MCU application software.

The MCU application software will use this report as follows:

Note

ErrorReportFrame contents are only valid if SystemFailureId=0xFFFF.

  • P_SocErrPinNotif (Sender-Receiver interface, Mcu_Foh: Sender)

Notify NVIDIA Thor SoC error pin status to MCU application software.

API

Argument

Type

Range/Enum/Value

Notes for Integrator

Rte_Write_P_ SocErrPinNotif_ SocErrPinStatus

SocErrPinStatus

SocErrPinStatusDataType

SOCERR_PIN_INIT (0x1)

SOCERR_PIN_DEASSERTED (0x55)

SOCERR_PIN_ASSERTED (0xAA)

Applications like MCU Error Handler can connect to this interface to receive SoC error pin status, which indicates failures on Thor.

The MCU application software will use this report as follows:

  • P_McuPeriodicReport

Notify NVIDIA Thor™ of the periodic failure status to MCU application software.

API

Argument

Type

Range/Enum/Value

Notes for Integrator

Rte_Send_P_ McuPeriodicReport_ PeriodicStatus

PeriodicStatus

SS_ NvSehSmcuPeriodicReport_t

Array of 252 bytes

The 252 bytes buffer is opaque to MCU FoH. MCU FoH would pass this periodic status to the MCU application. The application, which receives periodic reports, connects to this interface. The application can choose to store periodic failure reports to Nvm.

  • P_McuFoh_ErrorNotification (Sender-Receiver interface, Mcu_Foh: Sender)

Notify MCU FoH internal errors to MCU Error Handler (These error reports are different from NVIDIA Thor error reports from Safety Services; these are the reports of Thor errors directly detected within the MCU FOH. Thor error indicates FSI integrity errors and that a failover must be initiated.)

API

Argument

Type

Range/Enum/Value

Notes for integrator

Rte_Send_P_McuFoh_ ErrorNotification_ ErrorInfo

ErrorInfo

NvMCU_ErrorReportFrame_t

Rerported ID is MCUFOH_REPORTER_ID (0x810EU)

Following are the possible ErrorCodes

MCUFOH_E2EPROTECTERR_NOTIFICATION (0x1)

MCUFOH_E2ECHECKERR_NOTIFICATION (0x2)

MCUFOH_E2EFRAMEERR_NOTIFICATION (0x3)

MCUFOH_ERRID_DIO_FAILURE (0x5)

MCUFOH_E2EPROCTECTERR_INIT (0x6)

MCUFOH_E2ECHECKERR_INIT (0x7)

MCUFOH_SPIERR_TRANSMIT (0x8)

MCUFOH_SPIERR_TRANSMIT_REJECT (0x9)

Application, which is integrating to MCU FOH should also connect to this interface to receive internal error information of MCU FOH.

Error Code details#

Error Code

Description

MCUFOH_E2EPROTECTERR_NOTIFICATION

This error is notified when E2E packetization fails for a packet that is to be sent out to FSI FOH

MCUFOH_E2ECHECKERR_NOTIFICATION

This error is notified when the input parameters of E2E_P05Check are invalid

MCUFOH_E2EFRAMEERR_NOTIFICATION

This error is notified when E2E check fails for a packet received from FSI FOH

MCUFOH_ERRID_DIO_FAILURE

This error is notified when setting of SPI Chip Select GPIO fails

MCUFOH_E2EPROCTECTERR_INIT

This error is notified when E2E Protect Init fails

MCUFOH_E2ECHECKERR_INIT

This error is notified when E2E Check Init fails

MCUFOH_SPIERR_TRANSMIT

This error is notified when previous SPI transaction failed

MCUFOH_SPIERR_TRANSMIT_REJECT

This error is notified when Spi_AsyncTransmit fails

R_SpiRxProcess (Trigger interface)

This interface receives trigger events when data from SoC is transferred by SPI and ready to consume.

R_McuData (Sender-Receiver interface, Mcu_Foh: Receiver)

The interface receives data from MCU application software to send to FSI. The received data should be persisted. This data is opaque to MCU FoH

API

Argument

Type

Range/Enum/Value

Notes for integrator

Rte_Send_P_SR_McuData_McuData

McuData

SS_McuData_t

Array of 252 bytes

Mcu Application should use this interface to send its data over to FSI

  • P_StartMontitoring (Client-Server interface, Mcu_Foh: Server)

Interface to invoke Mcu FoH to start monitoring the SoC error pin of Thor and to start the SPI communication with FSI for receiving Thor failure information

Note

MCU FOH continues monitoring SOC error pin when Thor is powered-off or when it is undergoing L2 Reset. It stops SPI communication with FSI FOH when Thor is powered-off or when it is undergoing L2 Reset. L2 reset is a hard reset triggered on Thor when a DRAM ECC uncorrected error is reported.

API

Argument

Type

Range/Enum/Value

Notes for integrator

Rte_Call_R_CS_StartMontitoring_ StartMonitoring

startDelay

uint32

0 to MAX in ms

The application on MCU, which is aware of the Thor power states that should connect to this interface. This application should invoke this interface to start monitoring Thor when Thor is powered on in functional mode. The application should pass the appropriate startDelay. Based on the startDelay in ms, MCU FOH waits for this time interval before it starts the SPI communication and SoC error pin monitoring. MCU Customer should call StartMonitoring only after releasing the reset of Thor (cold boot). Otherwise, MCU_FOH detects false failures. The delay is necessary to account for the Thor boot-up time. SoC_ERROR pin starts toggling with a short delay after releasing Thor reset. The delay is the input parameter for StartMonitoring .

Source Path#

Dependencies on Other Elements of NVIDIA DriveOS Software#

  • NvMCU_SwModules includes the software components executed on Safety MCU and the BootChain-Configuration Library.