Debugging and Profiling#

DriveOS Linux has a dedicated eMMC partition (gos0-crashlogs) where Linux Guest OS kernel panic/OOPS logs are stored. In cases, such as in a production system, where kernel logs are not available over UART, this feature allows kernel panic/OOPS logs to be stored in a non-volatile storage partition that can be retrieved and analyzed on subsequent bootups.

This feature relies on:

  • pstore functionality in Linux Guest OS kernel

  • systemd-pstore service in Linux Guest OS

  • eMMC partition (accessed by Linux Guest OS as a virtual storage partition)

  • nvidia,tegra-hv-oops-storage driver

Due to the mechanism by which virtual storage partition works, there are some limitations where panic/OOPS logs might not be stored. The following components need to be functional at minimum when Linux Guest OS kernel tries to store panic/OOPS logs: IVC, Mempool, Hypervisor and Storage Server. If Linux Guest OS is generating OOPS/panic logs as an indirect effect of any of these modules not functioning, the OOPS/panic logs will not be stored. For example, consider a case where Storage Server stops responding to data requests for user storage partitions (also accessed by Linux Guest OS as a virtual storage partition) due to internal error. In this case Linux Guest OS kernel can generate OOPS/panic logs, but it will not be stored in gos0-crashlogs partition.

For additional information regarding how Linux Guest OS panic/OOPS logs are stored and retrieved in a subsequent boot cycle, refer to Storage Server.

By default, when Linux Guest OS is operating normally (and has been operating normally from when DriveOS was flashed), there are no panic/OOPS logs:

root@tegra-ubuntu:/home/nvidia# ls -al /var/lib/systemd/pstore/
ls: cannot access '/var/lib/systemd/pstore/': No such file or directory

Each time there is Linux Guest OS kernel panic/OOPS, the pstore kernel subsystem will automatically store the panic/OOPS logs in gos0-crashlogs partition. There will be separate files for each instance of kernel panic/OOPS (note: these files are only visible on a subsequent reboot, they won’t be visible on the current boot cycle if the system is functional after OOPS). On subsequent boot, systemd-pstore service in conjunction with the pstore kernel subsystem, will retrieve logs and store them in rootfs. User doesn’t need to do any direct read/write to gos0-crashlogs partition, but would only need to read/write from rootfs directory maintained by systemd-pstore service.

Here is an example where there was kernel panic followed by DriveOS reboot. After reboot, user can see some files under /var/lib/systemd/pstore/ directory:

root@tegra-ubuntu:/home/nvidia# ls -al /var/lib/systemd/pstore/
total 168
drwxr-xr-x  2 root root  4096 Jan 10 04:56 .
drwxr-xr-x 10 root root  4096 Jan 10 04:56 ..
-rw-------  1 root root 78611 Mar 18 18:23 dmesg-tegra_hv_vblk_oops-0
-rw-r-----  1 root root 78639 Jan 10 04:56 dmesg.txt

root@tegra-ubuntu:/home/nvidia# tail -20 /var/lib/systemd/pstore/dmesg-tegra_hv_vblk_oops-0
<4>[  181.650674]  dump_backtrace+0x0/0x1d0
<4>[  181.650683]  show_stack+0x2c/0x40
<4>[  181.650686]  dump_stack+0xd8/0x138
<4>[  181.650692]  panic+0xd0/0x3a4
<4>[  181.650694]  sysrq_reset_seq_param_set+0x0/0xa0
<4>[  181.650700]  __handle_sysrq+0x90/0x1a0
<4>[  181.650702]  write_sysrq_trigger+0x144/0x250
<4>[  181.650704]  proc_reg_write+0xc4/0x110
<4>[  181.650708]  vfs_write+0xc0/0x3f0
<4>[  181.650711]  ksys_write+0x78/0x100
<4>[  181.650713]  __arm64_sys_write+0x24/0x30
<4>[  181.650716]  el0_svc_common.constprop.0+0x7c/0x1c0
<4>[  181.650719]  do_el0_svc+0x34/0xa0
<4>[  181.650722]  el0_svc+0x1c/0x30
<4>[  181.650724]  el0_sync_handler+0xa8/0xb0
<4>[  181.650725]  el0_sync+0x16c/0x180
<2>[  181.756669] SMP: stopping secondary CPUs
<0>[  181.757182] Kernel Offset: disabled
<0>[  181.757615] CPU features: 0x0040006,4800a238
<0>[  181.758168] Memory Limit: none