NCC Health Check: ahv_crash_file_check

NCC Health Check: ahv_crash_file_check

NCC Health Check: ahv_crash_file_check

Description

The NCC health check ahv_crash_file_check reports if any AHV host crash dumps are detected on any cluster hosts. 

This check was introduced in NCC 3.5.1.

Running the NCC Check

Run this check as part of the complete NCC Health Checks.

nutanix@cvm$ ncc health_checks run_all

Or run this check separately.

nutanix@cvm$ ncc health_checks hypervisor_checks ahv_crash_file_check

As of NCC 3.0, you can also run the checks from the Prism web console Health page: select Actions > Run Checks. Select All checks and click Run.

In NCC releases older than 4.0.1 this check was looking for the presence of any crash dump file in the AHV host's /var/crash directory. Starting from NCC 4.0.1 only crash dump files created in the last 7 days are checked.
If crash dumps are found NCC shows a WARN output with reference to the specific files found.

This check only applies to AHV Hypervisor.

This check is scheduled to run every day.

This check generates an alert starting from NCC 4.6.2.

Sample output

For Status: WARN

The following is an example of the check output when an AHV Kernel Crash Dump is detected.

Running : health_checks hypervisor_checks ahv_crash_file_check
[==================================================] 100%
/health_checks/hypervisor_checks/ahv_crash_file_check            [ WARN ]
------------------------------------------------------------------------+
Detailed information for ahv_crash_file_check:
Node x.x.x.x:
WARN: Found the following crash file(s) at x.x.x.x: {name_of_dump_file}.
Refer to KB 4866 (http://portal.nutanix.com/kb/4866) for details on ahv_crash_file_check or Recheck with: ncc health_checks hypervisor_checks ahv_crash_file_check --cvm_list=x.x.x.x

You may also see the following message in Prism / Health.

"Found kernel crash file(s) on AHV host(s)."
"Notify Nutanix support to investigate the kernel issues."
"Kernel issue may affect hypervisor functionalities."
"Found the following crash file"
"Recent AHV Crash File Detected on node"

Output messaging

Check ID 11053
Description  Check if /var/crash is empty.
Causes of failure Found kernel crash file(s) on AHV host(s).
Resolutions Notify Nutanix support to investigate the kernel issues.
Impact Kernel issues may affect hypervisor functionalities.

Solution

If AHV Kernel Crash Dumps are detected, contact Support for further investigation into why and what occurred to produce the dump file and provide guidance to avoid any reoccurrence.

When raising a case with Support, include the following:

  • NCC check output.
  • List of files with timestamps: 
    [root@ahvhost ~]#  ls -lahtr /var/crash/
  • A copy of the specific crash dump file.

Note: in some rare cases, a warning alert can be raised without crash file names listed:

"Failed to perform crash file check at x.x.x.x: "

Run the "ls -lahtr /var/crash/" command on the affected AHV host. If no crash files are found, then resolve the alert.

Collecting Additional Information

  • Before collecting additional information, upgrade NCC. For information on upgrading NCC, refer to Nutanix KB 2871.
  • Upload the NCC output file ncc-output-latest.log, created on running the NCC check. Refer to Nutanix KB 2871 for details on running NCC and collecting this file.
  • Collect Logbay bundle using the following command. For more information on Logbay, see Nutanix KB 6691.
nutanix@cvm$ logbay collect --aggregate=true

Additional Information

Document ID:HT516510
Original Publish Date:05/21/2024
Last Modified Date:05/23/2024