Error Reporting#
The following is a list of acronyms and abbreviations used in this section:
Acronym / Abbreviation |
Definition |
---|---|
CCPLEX |
CarmelCPU Complex. Set of Arm cores in NVIDIA DRIVE AGX Thor for user application. |
FSI |
FSI(Functional SafetyIsland) running AUTOSAR Safety Application. |
MCU |
Microcontroller unit. Infineon AURIX TC397 on NVIDIA DRIVE AGX Thor Platform. |
HSP |
hardware synchronization primitives - hardware block providing Mailbox functionality. |
EPD |
Error Propagation Daemon. |
EPL |
Error Propagation Library. |
EPS |
Error Propagation Server. |
SEH |
System error handler. |
MCUFOH |
Failover Handler on Microcontroller Unit. |
FSIFOH |
Failover Handler on Functional Safety Island. |
GOS |
GuestOSpartition on CCPLEX. |
Update VM |
Drive Update Virtual Machine (a GuestOS Partition on CCPLEX) |
ATF |
Arm Trusted Firmware |
BPMP |
Boot and power management processor |
HSI |
Hardware software interface |
DT |
Device Tree |
Overview#
Safety Services is a software framework that detects/reads the hardware errors through the HSMs and ECs defined by Thor HSI and also enables different software elements to report the software [detected] errors.
The high level block diagram below shows the interaction between different sub-elements within Safety Services:
Figure 1. Block Diagram
Error Propagation Library (EPL) provides an interface to clients running on CCPLEX to report errors. It is a dynamic linked shared object and forwards error report packets to the daemon (EPD).
Error propagation daemon (EPD) on AV + L is responsible for parsing the DT and sending it to FSI.
tegra-epl
kernel driver receives data from userspace EPL. This provides an interface for different instances of user space error propagation library to connect. It serializes error reports from all instances of EPL and sends it to EPS (FSI) via TOP3_HSP.
Error propagation server (EPS) is a central entity running on FSI responsible for collecting all hardware (HSM) and software errors. EPS connects to TOP3_HSP to receive error reports from CCPLEX clients sent via EPL/tegra-epl
. EPS also receives interrupts on HSM errors and reads HSM/EC registers to determine hardware error source. EPS then reports these errors to System Error Handler (SEH).
SEH runs on FSI and is responsible for handling all the errors reported to FSI. DriveOS provides a sample dummy implementation for SEH named “SEH placeholder”. SEH placeholder assumes (for demonstration purpose) some error reports as critical failure and forward them to FOH. Concrete implementation of SEH is assumed to be done by the end software integrator i.e., DRIVE AV or OEM/Tier1.
Failover Handler on Functional Safety Island (FSI FOH) is responsible for communication of failure to MCU via SPI and SOC ERROR PIN. FSI FOH receives critical failure reports from SEH placeholder and sends the failure report data to MCU FOH via SPI and asserts SOC ERROR PIN.
Failover Handler on Micro Controller Unit (MCU FOH) is an entity running on MCU responsible for communication with Safety Services running on Thor SoC. MCU FOH receives critical error reports from FSI FOH via SPI.
Error reporting use case is demonstrated by demo app – DemoAppSwErr, which runs on CCPLEX and demonstrates usage of EPL to report errors.
SehDriveOS#
DriveOS provides a sample dummy implementation for SEH named as “SehDriveOS.” Interface block diagram for SehDriveOS:
List of Interfaces#
Interface Name |
Description |
---|---|
P_SEH_ReportCriticalFailure |
Critical Failure information is provided to FSI FOH using this interface to be sent to MCU. |
R_ClearError |
It is the client port on SEH, which is used to clear any active error. It is provided by EPS. |
R_ErrorReport |
It is the receiver port on SEH, which used to receive Error Report frames from EPS. |
R_DeInitNotif |
It is the receiver port on SEH, which is used to receive the notification of SC7 Entry and shutdown notifications from EPS. |
P_Deinit_Flag |
It is the sender port on SEH, which is used to notify FSI FOH that all the critical failure reports are sent to FSI FOH. |
P_McuPeriodicReport |
It is the sender port on SEH, which is used to send periodic status to MCU. The datatype for this interface is a 252-byte array. The SEH implementation determines the data structure of the periodic report. |
R_StateChange_Request_Rx |
It is the receiver port on SEH, to receive state change notification from NvCCPLEX_FSI_App. |
P_StateChange_Reply_Tx |
It is the sender port on SEH, which is used to respond to state change notification received from NvCCPLEX_FSI_App. In current placeholder implementation this is not used. |
P_TriggerHsmReset |
It is the server port on SEH, this is used by DramECC error handler to notify SEH to trigger L1 reset for performing DRAM page retirement. (Refer FSI-SW Delivery Package for details on DRAM ECC error handling sequence). |
R_PerformHsmReset |
It is the client port on SEH, which is used to perform L1 reset by triggering the interface provided by EPS. |
R_SocErrPinAssrt |
It is the client port on SEH, which is used to assert SOC error pin for the software reported errors. |
R_DramEccEh |
It is the client port on SEH, which is used to trigger Dram ECC uncorrected error handler when SEH receives DRAM ECC uncorrected error notification. |
R_McuData |
It is a receiver port for receiving data from the MCU Application. The data from Mthe CU Application could be data from persistence memory on MCU. |
R_Swt_Service_SEH |
This is client port, which is used to read time stamp from Swt Cdd |
R_NvHspMailboxConsume |
This is the client port on SEH , which is used for polling QM errors from CCPLEX |
Service Ports#
Port Name |
Description |
---|---|
P_ESH_DOS_State_Request_BswM |
It is the interface to request DOS states of BswM. |
P_ESH_User_Request_BswM |
It is the interface to release current BswM state. |
R_ESH_Mode |
It is the interface to read the current BswM state. |
Note
Refer to the FSI Software release document for details on BswM states. These service port interfaces are optional.