Description
The NCC health check pcvm_disk_usage_check verifies that the amount of disk or system partition usage in the Prism Central (PC) VM is within limits.
This check has the following parts:
- Checking the individual data disk usage (added in NCC 3.5.1):
- If usage is more than 75% for several hours, a WARNING is returned to identify the disk.
- If usage is more than 90% for several hours, a FAIL is returned to identify the disk.
- Checking the overall data disk usage (added in NCC 3.10.1):
- If overall usage is more than 90% for several hours, a WARNING is returned.
- Checking the Prism Central VM system root partition usage (added in NCC 3.9.4). Returns only FAIL message if the partition usage exceeds 95%.
- Checking the Prism Central VM home partition usage (added in NCC 3.9.4):
- If the usage is more than 75%, a WARNING is returned.
- If the usage is more than 90%, a FAIL is returned.
- Checking the Prism Central VM CMSP partition usage (added in NCC 3.10.1):
- If usage is more than 75%, a WARNING is returned.
- If the usage is more than 90%, a FAIL is returned.
- Checking the Prism Central VM Upgrade disk partition usage (added in NCC 4.6.0):
- If the usage is more than 70%, a FAIL is returned.
- This check runs every 5 mins.
- If there are more than 5 failures (30 mins), a critical alert is raised.
Note: If you are running LCM-2.6 or LCM-2.6.0.1, LCM log collection fills up /home directory refer KB-14671 for workaround.
Running the NCC check
Run the NCC check as part of the complete NCC health checks.
Click here to display detailed information below:
nutanix@pcvm$ ncc health_checks run_all
Or run the pcvm_disk_usage_check check separately.
nutanix@pcvm$ ncc health_checks system_checks pcvm_disk_usage_check
You can also run the checks from the Prism Web Console Health page: select Actions > Run Checks. Select All checks and click Run.
This check is scheduled to run every 5 minutes, by default.
This check will generate an alert after 5 consecutive failures across scheduled intervals.
Sample Outputs
For status: PASS
Running : health_checks system_checks pcvm_disk_usage_check
[==================================================] 100%
/health_checks/system_checks/pcvm_disk_usage_check [ PASS ]
-------------------------------------------------------------------------------+
+---------------+
| State | Count |
+---------------+
| Pass | 1 |
| Total | 1 |
+---------------+
Plugin output written to /home/nutanix/data/logs/ncc-output-latest.log
For Status: WARN (on Prism Central VM data disk, e.g. /dev/sdc1)
Running : health_checks system_checks pcvm_disk_usage_check
[==================================================] 100%
/health_checks/system_checks/pcvm_disk_usage_check [ WARN ]
-------------------------------------------------------------------------------+
Detailed information for pcvm_disk_usage_check:
Node x.x.x.x:
WARN: Prism Central VM x.x.x.x disk usage exceeds warning limit 75 % for disks: /dev/sdc1(/home/nutanix/data/stargate-storage/disks/NFS_2_0_283_5a853328_a7fa_45a4_b3d2_6f91cffaa653).
Refer to KB 5228 (http://portal.nutanix.com/kb/5228) for details on disk_usage_check or Recheck with: ncc health_checks system_checks pcvm_disk_usage_check --cvm_list=x.x.x.x
+-----------------------+
| State | Count |
+-----------------------+
| Warning | 1 |
| Total Plugins | 1 |
+-----------------------+
Plugin output written to /home/nutanix/data/logs/ncc-output-latest.log
For Status: WARN (on Prism Central VM overall MultiVDisk)
Running : health_checks system_checks pcvm_disk_usage_check
[==================================================] 100%
/health_checks/system_checks/pcvm_disk_usage_check [ WARN ]
-------------------------------------------------------------------------------+
Detailed information for pcvm_disk_usage_check:
Node x.x.x.x:
WARN: Prism Central VM x.x.x.x overall MultiVDisk usage exceeds warning limit of 2321329924 KB.
Refer to KB 5228 (http://portal.nutanix.com/kb/5228) for details on disk_usage_check or Recheck with: ncc health_checks system_checks pcvm_disk_usage_check --cvm_list=x.x.x.x
+-----------------------+
| State | Count |
+-----------------------+
| Warning | 1 |
| Total Plugins | 1 |
+-----------------------+
Plugin output written to /home/nutanix/data/logs/ncc-output-latest.log
For Status: FAIL (on Prism Central VM data disk, e.g. /dev/sdc1)
Running : health_checks system_checks pcvm_disk_usage_check
[==================================================] 100%
/health_checks/system_checks/pcvm_disk_usage_check [ FAIL ]
-------------------------------------------------------------------------------+
Detailed information for pcvm_disk_usage_check:
Node x.x.x.x:
WARN: Prism Central VM x.x.x.x disk usage exceeds critical limit 90 % for disks: /dev/sdc1(/home/nutanix/data/stargate-storage/disks/NFS_2_0_283_5a853328_a7fa_45a4_b3d2_6f91cffaa653).
Refer to KB 5228 (http://portal.nutanix.com/kb/5228) for details on disk_usage_check or Recheck with: ncc health_checks system_checks pcvm_disk_usage_check --cvm_list=x.x.x.x
+-----------------------+
| State | Count |
+-----------------------+
| Fail | 1 |
| Total Plugins | 1 |
+-----------------------+
Plugin output written to /home/nutanix/data/logs/ncc-output-latest.log
For Status: FAIL (on root partition, i.e. /)
Running : health_checks system_checks pcvm_disk_usage_check
[==================================================] 100%
/health_checks/system_checks/pcvm_disk_usage_check [ FAIL ]
-------------------------------------------------------------------------------+
Detailed information for pcvm_disk_usage_check:
Node x.x.x.x:
FAIL: PC VM root partition x.x.x.x disk usage exceeds critical limit 95 % for disks: 97%.
Refer to KB 5228 (http://portal.nutanix.com/kb/5228) for details on disk_usage_check or Recheck with: ncc health_checks system_checks pcvm_disk_usage_check --cvm_list=x.x.x.x
+-----------------+
| State | Count |
+-----------------+
| Fail | 1 |
| Total | 1 |
+-----------------+
Plugin output written to /home/nutanix/data/logs/ncc-output-latest.log
For Status: WARN (on Prism Central VM home partition, i.e. /home)
Running : health_checks system_checks pcvm_disk_usage_check
[==================================================] 100%
/health_checks/system_checks/pcvm_disk_usage_check [ WARN ]
-------------------------------------------------------------------------------+
Detailed information for pcvm_disk_usage_check:
Node x.x.x.x:
WARN: Prism Central VM x.x.x.x home partition disk usage exceeds warning limit 75 %.
Refer to KB 5228 (http://portal.nutanix.com/kb/5228) for details on disk_usage_check or Recheck with: ncc health_checks system_checks pcvm_disk_usage_check --cvm_list=x.x.x.x
+-----------------------+
| State | Count |
+-----------------------+
| Warning | 1 |
| Total Plugins | 1 |
+-----------------------+
Plugin output written to /home/nutanix/data/logs/ncc-output-latest.log
For Status: FAIL (on Prism Central VM home partition, i.e. /home)
Running : health_checks system_checks pcvm_disk_usage_check
[==================================================] 100%
/health_checks/system_checks/pcvm_disk_usage_check [ FAIL ]
-------------------------------------------------------------------------------+
Detailed information for pcvm_disk_usage_check:
Node x.x.x.x:
WARN: Prism Central VM x.x.x.x home partition disk usage exceeds critical limit 90 %.
Refer to KB 5228 (http://portal.nutanix.com/kb/5228) for details on disk_usage_check or Recheck with: ncc health_checks system_checks pcvm_disk_usage_check --cvm_list=x.x.x.x
+-----------------------+
| State | Count |
+-----------------------+
| Fail | 1 |
| Total Plugins | 1 |
+-----------------------+
Plugin output written to /home/nutanix/data/logs/ncc-output-latest.log
For Status: WARN (on Prism Central VM CMSP partition, i.e. /dev/sde)
Running : health_checks system_checks pcvm_disk_usage_check
[==================================================] 100%
/health_checks/system_checks/pcvm_disk_usage_check [ WARN ]
-------------------------------------------------------------------------------+
Detailed information for pcvm_disk_usage_check:
Node x.x.x.x:
WARN: Platform disk space usage in Prism Central VM x.x.x.x exceeds 75% for disk(s): /dev/sde
Refer to KB 5228 (http://portal.nutanix.com/kb/5228) for details on pcvm_disk_usage_check or Recheck with: ncc health_checks system_checks pcvm_disk_usage_check --cvm_list= x.x.x.x
+-----------------------+
| State | Count |
+-----------------------+
| Warning | 1 |
| Total Plugins | 1 |
+-----------------------+
Plugin output written to /home/nutanix/data/logs/ncc-output-latest.log
For Status: FAIL (on Prism Central VM CMSP partition, i.e. /dev/sde)
Running : health_checks system_checks pcvm_disk_usage_check
[==================================================] 100%
/health_checks/system_checks/pcvm_disk_usage_check [ FAIL ]
-------------------------------------------------------------------------------+
Detailed information for pcvm_disk_usage_check:
Node x.x.x.x :
FAIL: Platform disk space usage in Prism Central VM x.x.x.x exceeds 90% for disk(s): /dev/sde
Refer to KB 5228 (http://portal.nutanix.com/kb/5228) for details on pcvm_disk_usage_check or Recheck with: ncc health_checks system_checks pcvm_disk_usage_check --cvm_list= x.x.x.x
+-----------------------+
| State | Count |
+-----------------------+
| Fail | 1 |
| Total Plugins | 1 |
+-----------------------+
Plugin output written to /home/nutanix/data/logs/ncc-output-latest.log
For Status: FAIL (on Prism Central VM upgrade disk partition, i.e. /home/nutanix/upgrade)
Running : health_checks system_checks pcvm_disk_usage_check
[==================================================] 100%
/health_checks/system_checks/pcvm_disk_usage_check [ FAIL ]
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+Detailed information for pcvm_disk_usage_check:
Node x.x.x.x:
FAIL: Prism Central VM x.x.x.x upgrade disk usage exceeds critical limit 70 %.
Refer to KB 5228 (http://portal.nutanix.com/kb/5228) for details on pcvm_disk_usage_check or Recheck with: ncc health_checks system_checks pcvm_disk_usage_check --cvm_list=x.x.x.x
+-----------------------+
| State | Count |
+-----------------------+
| Fail | 1 |
| Total Plugins | 1 |
+-----------------------+
Plugin output written to /home/nutanix/data/logs/ncc-output-latest.log
Note: All commands in this article to be run on the PC assume that you log into the PC VM via SSH.
Checking Disk Usage in PC VM
Following is an example of how to check disk usage on a PC VM.
Click here to display the example below:
nutanix@pcvm$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sdb1 9.8G 7.2G 2.2G 78% /
devtmpfs 7.9G 0 7.9G 0% /dev
tmpfs 7.9G 16K 7.9G 1% /dev/shm
tmpfs 7.9G 428K 7.9G 1% /run
tmpfs 7.9G 0 7.9G 0% /sys/fs/cgroup
/dev/sdb3 40G 4.4G 35G 12% /home
/dev/sdc1 493G 431G 57G 69% /home/nutanix/data/stargate-storage/disks/NFS_1_0_450_823394be_0c7b_4f18_8335_71bae1bc6c82
tmpfs 1.6G 0 1.6G 0% /run/user/1000
Partition for upgrade disk would show up similar to:
/dev/sde 30G 26G 4.1G 87% /home/nutanix/upgrade
Output messaging
Check ID
|
101059 |
Description
|
Check that disk space usage on the Prism Central VM is within limits. |
Causes of failure
|
High disk usage in the Prism Central VM. |
Resolutions
|
Refer to KB 5228 for further details. |
Impact
|
Prism Central VM may run out of storage space to store data. |
Alert ID
|
A101059 |
Alert Title
|
Prism Central VM disk usage high |
Alert Message
|
Prism Central VM IP disk usage exceeds warning limit x % for disks: disks.
Prism Central VM IP overall MultiVDisk usage exceeds warning limit of 2321329924 KB
Prism Central VM IP disk usage exceeds critical limit x % for disks: disks. |
Check ID
|
200316 |
Description
|
Checks if Prism Central system root partition usage is within the threshold to ensure uninterrupted operations. |
Causes of failure
|
Increased Prism Central VM system root partition usage due to excessive logging or incomplete maintenance operation. |
Resolutions
|
Reduce Prism Central VM system root partition usage by removing any known temporary or unneeded files. Refer to KB 5228 for further details. |
Impact
|
If the Prism Central VM system root partition is highly utilised, certain maintenance operations, such as upgrades, may be impacted. If the Prism Central VM system root partition is 100% utilized, services may stop and impact Prism Central cluster management functions. |
Alert ID
|
A200316 |
Alert Title
|
Prism Central VM System Root Partition Space Usage High
|
Alert Message
|
Disk space usage for root partition mount_path on entity-ip_address has exceeded threshold%. |
Check ID
|
200317 |
Description
|
Checks if Prism Central home partition usage is within the threshold to ensure uninterrupted operations. |
Causes of failure
|
Increased Prism Central VM home partition usage due to excessive logging or incomplete maintenance operation. |
Resolutions
|
Reduce Prism Central VM home partition usage by removing any known temporary or unneeded files. Refer to KB 5228 for further details. |
Impact
|
If the Prism Central VM home partition is highly utilised, then certain maintenance operations, such as upgrades, may be impacted. If Prism Central VM home partition is 100% utilised, then services may stop and impact cluster storage availability. |
Alert ID
|
A200317 |
Alert Title
|
Prism Central VM home partition disk usage high |
Alert Message
|
Prism Central VM IP home partition disk usage exceeds warning limit x %.
Prism Central VM IP home partition disk usage exceeds critical limit x %. |
Check ID
|
200328 |
Description
|
Check that platform disk space usage on the Prism Central VM is within limits. |
Causes of failure
|
High disk usage in the Prism Central VM. |
Resolutions
|
Refer to KB 5228 for further details. |
Impact
|
Prism Central VM may run out of storage space to store data. |
Alert ID
|
A200328 |
Alert Title
|
Prism Central VM platform disk space usage high |
Alert Smart Title
|
Prism Central VM svm_ip platform disk space usage high |
Alert Message
|
Platform disk space usage in Prism Central VM svm_ip exceeds percentage_exceed% for disk(s): disk_paths. |
Check ID
|
200334 |
Description
|
Checks if Prism Central upgrade disk usage is within the threshold to ensure uninterrupted upgrade operations. |
Causes of failure
|
Increased Prism Central VM upgrade disk usage due to the presence of multiple Prism Central Installer files |
Resolutions
|
Reduce Prism Central VM upgrade partition usage by removing the Prism Central Installer files which are not needed. Refer to KB 5228 for further details. |
Impact
|
If the Prism Central VM upgrade disk is highly utilised, then the Prism Central Upgrade would fail due to lack of space in the upgrade disk. |
Alert ID
|
A200334 |
Alert Title
|
Prism Central VM upgrade disk usage |
Alert Message
|
Prism Central VM <IP> upgrade disk usage exceeds critical limit x% |
Scenarios that trigger pcvm_disk_usage check Warn/Fail on /home partition
Click here to display detailed information in this step:
Scenario 1
The /home directory on long-running PC instances might reach close to its maximum limit as older Prism code is not cleaned up:
nutanix@pcvm$ cat ~/config/upgrade.history
Thu, 17 Dec 2020 08:51:43 el7.3-release-euphrates-5.19-stable-b2ab98294375c3f24f4d813b83ffcb43d85ebcc1
Tue, 19 Jan 2021 11:53:43 el7.3-release-euphrates-5.19-stable-aadf03fd084cb00f0414f84549b7ebbe9691a984
Wed, 24 Feb 2021 08:53:13 el7.3-release-euphrates-5.19-stable-ddf5fcc232b693ae965280668b10d0337ce99281
Mon, 19 Apr 2021 07:03:39 el7.3-release-euphrates-5.19-stable-6d6cec7de63c8fd117eeb59162031d03c2faf548
Mon, 26 Apr 2021 07:00:07 el7.3-release-euphrates-5.19-stable-3927829dad6a930e67f2f4a47e752df5a8f6c64d
Tue, 01 Jun 2021 10:15:14 el7.3-release-euphrates-5.19-stable-db974bded2c0cd1037288ca7aa9aef6f5e441222
Mon, 14 Jun 2021 09:47:29 el7.3-release-fraser-6.0-stable-a48467616ee7c603e3cee3174779cf24bea227cb
Thu, 01 Jul 2021 11:52:24 el7.3-release-fraser-6.0-stable-0601c1f41bad35bf4afe05da443947d34927c6ae
Thu, 05 Aug 2021 09:16:28 el7.3-release-fraser-6.0-stable-b9dbe4a0b0876cffa23d268d8ddc7f272fa4a166
Wed, 01 Sep 2021 07:44:46 el7.3-release-fraser-6.0-stable-f948d198de58b1b1e511431dbef0b34d20c82739
nutanix@pcvm$ sudo du -sh /home/apache/www/console/el7.3-release-*
304M el7.3-release-euphrates-5.18.1.1-stable-4546d2908cb8495b316deb45de63b7f5e52541a1
541M el7.3-release-euphrates-5.18.1.2-stable-b1b096696c0c034570545912a00d39746e901f36
675M el7.3-release-euphrates-5.19.1.5-stable-0f9e00f661436fef1af18a094089744f34ccd8c0
1.1G el7.3-release-euphrates-5.19.1.6-stable-a1bbd4f054f86b9d445bf2153b93c5d8d920cff7
629M el7.3-release-euphrates-5.19.1-stable-6edca74801c9db2ff2003780084bb12aa6aa29f4
694M el7.3-release-euphrates-5.19.2-stable-8e7da6324cbe5c34564ec51615b10a7737c6782a
1.1G el7.3-release-euphrates-5.19-stable-5282152e02f3ede70f0957217a62dc436c60b454
329M el7.3-release-euphrates-5.20.1.1-stable-726ea8f7dc4bca156d3e3f63cd7982eecb70c8cb
2.1G el7.3-release-fraser-6.0.1.1-stable-d9f94c47b63e3eb4179dd7a6e16202d5856581a6
The issue has been addressed and fixed in the releases: pc.2022.1, pc.2022.4, pc.2021.9.0.5 and later. For more details and the workaround, please refer to the corresponding solution section.
Removing older sysstats logs
If you have checked all the usual places but still need to clean up space, you can get the customer’s permission to remove older sysstats logs. Double-check that there are no open support cases with pending RCAs before proceeding, as this data may not yet have been collected.
nutanix@pcvm:~$ sudo du -h -d 1 /home/nutanix/data/logs | sort -h
4.0K /home/nutanix/data/logs/ecr
28K /home/nutanix/data/logs/work
5.5M /home/nutanix/data/logs/kafka
127M /home/nutanix/data/logs/cassandra
162M /home/nutanix/data/logs/data_providers
368M /home/nutanix/data/logs/ikat_access_logs
4.2G /home/nutanix/data/logs/sysstats
11G /home/nutanix/data/logs
nutanix@pcvm:~$
Scenario 2
Hyperkube logs (kublet logs) are not being cleaned up after enabling CMSP / microservices on Prism Central.
If Cluster Maintenance Utilities (CMU) has been updated to version 2.0.3 using LCM, the included Scavenger version is missing the capability to clean up certain logs related to CMSP microservices. As a result of this issue, users may find that Prism Central services are not starting or that they cannot log in to PC UI. This issue will first fill up the /home/nutanix/data/sys-storage/NFS_.../ directory, after which the kubelet logs will start filling up the root partition in the /tmp folder.
nutanix@pcvm$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 34G 0 34G 0% /dev
tmpfs 34G 52K 34G 1% /dev/shm
tmpfs 34G 3.4M 34G 1% /run
tmpfs 34G 0 34G 0% /sys/fs/cgroup
/dev/sdb2 9.8G 9.8G 0G 100% /
/dev/sdb3 50G 33G 16G 68% /home
tmpfs 6.7G 0 6.7G 0% /run/user/1000
/dev/sdf1 2.5T 21G 2.4T 1% /home/nutanix/data/stargate-storage/disks/NFS_6708977956_4f2835fa_ab29_41c5_9110_483bff268ca0
/dev/sdg1 2.5T 13G 2.4T 1% /home/nutanix/data/stargate-storage/disks/NFS_6708977958_10aa3f76_65a5_4fa6_8c88_7c70a4504f29
/dev/sde1 2.5T 20G 2.4T 1% /home/nutanix/data/stargate-storage/disks/NFS_6708977954_df3a5816_b14b_4098_9b58_d90d670781a1
/dev/sdc1 2.5T 12G 2.4T 1% /home/nutanix/data/stargate-storage/disks/NFS_6708977948_1bd3cd0d_de69_4a98_a18d_6049945e261b
/dev/sdd 98G 88G 5.4G 100% /home/nutanix/data/kafka/disks/NFS_6708977950_cd98c6f5_c534_486a_a939_4f40bffd986c
We may find hyperkube.ntnx* logs getting generated and not rotated correctly when CMSP is enabled. Check if the directory /home/nutanix/data/sys-storage/NFS.../kubelet/ is excessively filled with these logs.
nutanix@pcvm:~$ du -hsx /home/nutanix/data/sys-storage/NFS*/kubelet/
97.0G /home/nutanix/data/sys-storage/NFS_6708977950_cd98c6f5_c534_486a_a939_4f40bffd986c/kubelet/
nutanix@pcvm:~$ ls -l /home/nutanix/data/sys-storage/NFS_4_0_7036_6578653c_8a38_4af8_9649_42e7939f3656/kubelet/kubelet* | wc -l
98
When working as intended, we should see less than 10G used in this kubelet folder, and the file count for kubelet* would be less than ~15
We may additionally see similar kubelet logs filling up space in /tmp, after space in the /home/nutanix/data/sys-storage/NFS... directory has been exhausted.
nutanix@pcvm:~$ sudo du -hsx /tmp
2.8G /tmp
nutanix@pcvm:~/tmp$ sudo ls -larth /tmp/
total 2.8G
***truncated***
-rw-r--r--. 1 root root 109K Jun 23 05:49 kubelet.ntnx-ww-xx-yy-zz-a-pcvm.root.log.INFO.20230623-054920.15123
-rw-r--r--. 1 root root 1.4K Jun 23 05:50 kubelet.ntnx-ww-xx-yy-zz-a-pcvm.root.log.WARNING.20230623-055012.17214
-rw-r--r--. 1 root root 114K Jun 23 05:50 kubelet.ntnx-ww-xx-yy-zz-a-pcvm.root.log.INFO.20230623-055011.17214
-rw-r--r--. 1 root root 1.4K Jun 23 05:50 kubelet.ntnx-ww-xx-yy-zz-a-pcvm.root.log.WARNING.20230623-055038.18217
-rw-r--r--. 1 root root 114K Jun 23 05:50 kubelet.ntnx-ww-xx-yy-zz-a-pcvm.root.log.INFO.20230623-055038.18217
-rw-r--r--. 1 root root 109K Jun 23 05:51 kubelet.ntnx-ww-xx-yy-zz-a-pcvm.root.log.INFO.20230623-055106.19499
-rw-r--r--. 1 root root 1.8G Jun 24 03:20 kubelet.ntnx-ww-xx-yy-zz-a-pcvm.root.log.INFO.20230623-055151.21218
-rw-r--r--. 1 root root 60K Jun 24 15:06 kubelet.ntnx-ww-xx-yy-zz-a-pcvm.root.log.ERROR.20230623-055155.21218
-rw-r--r--. 1 root root 5.8M Jun 24 15:10 kubelet.ntnx-ww-xx-yy-zz-a-pcvm.root.log.WARNING.20230623-055152.21218
-rw-r--r--. 1 root root 990M Jun 24 15:10 kubelet.ntnx-ww-xx-yy-zz-a-pcvm.root.log.INFO.20230624-032057.21218
-rw-r--r--. 1 root root 103K Jun 24 23:59 kubelet.ntnx-ww-xx-yy-zz-a-pcvm.root.log.INFO.20230624-235940.170513
-rw-r--r--. 1 root root 3.3K Jun 25 00:01 kubelet.ntnx-ww-xx-yy-zz-a-pcvm.root.log.ERROR.20230625-000123.175052
-rw-r--r--. 1 root root 6.7K Jun 25 00:01 kubelet.ntnx-ww-xx-yy-zz-a-pcvm.root.log.WARNING.20230625-000120.175052
-rw-r--r--. 1 root root 2.8M Jun 25 00:01 kubelet.ntnx-ww-xx-yy-zz-a-pcvm.root.log.INFO.20230625-000120.175052
lrwxrwxrwx. 1 root root 67 Jun 25 04:01 kubelet.INFO -> kubelet.ntnx-ww-xx-yy-zz-a-pcvm.root.log.INFO.20230625-040145.21556
lrwxrwxrwx. 1 root root 70 Jun 25 04:01 kubelet.WARNING -> kubelet.ntnx-ww-xx-yy-zz-a-pcvm.root.log.WARNING.20230625-040145.21556
lrwxrwxrwx. 1 root root 68 Jun 25 04:01 kubelet.ERROR -> kubelet.ntnx-ww-xx-yy-zz-a-pcvm.root.log.ERROR.20230625-040148.21556
-rw-r--r--. 1 root root 25K Jun 25 04:02 kubelet.ntnx-ww-xx-yy-zz-a-pcvm.root.log.ERROR.20230625-040148.21556
-rw-r--r--. 1 root root 38K Jun 25 04:04 kubelet.ntnx-ww-xx-yy-zz-a-pcvm.root.log.WARNING.20230625-040145.21556
-rw-r--r--. 1 root root 6.0M Jun 25 04:04 kubelet.ntnx-ww-xx-yy-zz-a-pcvm.root.log.INFO.20230625-040145.21556
-rw-------. 1 nutanix nutanix 0 Jun 26 08:37 .nstat.u1000
-rw-r-----. 1 nutanix nutanix 0 Jun 26 08:38 lcm_metrics_uploader_lock
-rw-------. 1 nutanix nutanix 0 Jun 26 08:43 lazan_pc_greenlet_stack_dump
-rw-------. 1 nutanix nutanix 0 Jun 26 08:43 uhura_greenlet_stack_dump
drwxr-xr-x. 19 root root 4.0K Jun 26 08:44 ..
drwx------. 2 nutanix nutanix 4.0K Jun 26 09:24 hsperfdata_nutanix
drwxrwxrwt. 14 root root 4.0K Jun 26 09:35 .
Scenario 3
Starting PC.2022.6 - a dedicated 30 GB disk is created and mounted for PC upgrades. This Vdisk will be used for downloading and extracting upgrade binaries from consecutive upgrades. If the Prism Central VM upgrade disk is highly utilised, then the Prism Central Upgrade would fail due to lack of space in the upgrade disk.
nutanix@NTNX-PCVM:$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 13G 0 13G 0% /dev
tmpfs 13G 40K 13G 1% /dev/shm
tmpfs 13G 2.6M 13G 1% /run
tmpfs 13G 0 13G 0% /sys/fs/cgroup
/dev/sdb1 9.8G 7.2G 2.5G 75% /
/dev/sdb3 50G 14G 36G 28% /home
/dev/sde 30G 26G 4.1G 87% /home/nutanix/upgrade
/dev/sdc1 492G 147M 486G 1% /home/nutanix/data/stargate-storage/disks/NFS_2_0_271_960db4d2_45e7_4ef7_92bd_bdcd7e0b6aaf
tmpfs 2.6G 0 2.6G 0% /run/user/1000
Note: In case several services are enabled on Prism Central, such as msp, karbon, calm, flow, and objects, we would see /home usage to be high as each of the services will generate several logs and configuration files. Nutanix engineers are constantly working towards improving the /home usage. If none of the above scenarios matches and this affects Prism Central upgrade, engage Nutanix Support team to help manually cleanup /home by trimming the logs.
Scenario 4
Due to the log file rotation issue, Adonis logs directory usage is high. This log directory usage should not be over than 1G. For Scale-Out Prism Central deployments, check the file usage on each of the Prism Central VMs:
nutanix@pcvm:~/data/logs$ sudo du -h /home/nutanix/adonis/logs
6.1G /home/nutanix/adonis/logs/access
19G /home/nutanix
Scenario 5
In certain cases, catalina.out may consume a large amount of space on the Prism Central VM.
SSH to the Prism Central and check if /home/nutanix/data/logs/catalina.out is consuming a huge amount of space:
nutanix@PCVM:~$ allssh du -h /home/nutanix/data/logs/catalina.out
Scenario 6
For PC 2022.9 and above, high inode usage in the PCVM is seen to cause high root partition usage
SSH to the prism central and run the following commands to verify the inode usage:
nutanix@PCVM:~$ allssh df -i /
Solution
If the check reports a WARN or FAIL status, disk usage is above the threshold and needs investigation. Generally, space utilization can be queried using df -h. The output below shows the mount points as follows:
- /dev/sdb1 is root partition
- /dev/sdb3 is home partition
- /dev/sdc1 is data disk partition
nutanix@pcvm$ df -h
Filesystem Size Used Avail Use% Mounted on
devtmpfs 7.9G 0 7.9G 0% /dev
tmpfs 7.9G 44K 7.9G 1% /dev/shm
tmpfs 7.9G 6.1M 7.9G 1% /run
tmpfs 7.9G 0 7.9G 0% /sys/fs/cgroup
/dev/sdb1 9.8G 7.4G 2.3G 77% /
/dev/sdb3 50G 8.5G 41G 18% /home
/dev/sdc1 492G 150M 486G 1% /home/nutanix/data/stargate-storage/disks/NFS_2_0_267_5a298323_3c9f_4a6f_a265_10c4c1e6593e
tmpfs 1.6G 0 1.6G 0% /run/user/1000
/dev/sde 98G 401M 93G 1% /home/nutanix/data/sys-storage/NFS_1_0_264_1f5cda9a_2b3f_4f49_b348_baeb0ae338b8
tmpfs 1.6G 0 1.6G 0% /run/user/0
Data disk usage (/dev/sdXX) or overall multivdisk usage:
Verify the number of VMs supported for the particular Prism Central size is within the limit (consult the Prism Central Guide for your version from the Support Portal for the limits). Contact Nutanix Support. While opening a support case, attach the output of the following commands to the case.
nutanix@pcvm$ allssh df -h
nutanix@pcvm$ ncc health_checks system_checks pcvm_disk_usage_check
Prism Central VM home partition (/home):
Inspect the NCC output to determine which Prism Central VM has high usage, then perform the following:
- Log in to the Prism Central VM.
- Use the cd command to change the location to the /home partition.
- List the contents of the directory by size using the command below:
nutanix@pcvm$ ls -al | sort -k5,5nr
Examine the output for any large unused files that can be deleted.
- Run the du command below to list the usage of each file and sub-directory:
nutanix@pcvm$ sudo du -skxh * | sort -h
Examine the output of large sub-directories. You can run the du command for each sub-directory in question to further identify large unused files that can be deleted.
- Below are some common sub-directories of /home where large unused files are likely to exist:
- /home/nutanix/software_downloads/ - delete any old versions other than the versions you are currently upgrading.
- /home/nutanix/software_uncompressed/ - delete any old versions other than the versions you are currently upgrading.
- /home/nutanix/data/cores - delete old stack traces that are no longer needed.
- /home/nutanix/data/log_collector/ - delete old NCC Logs with NCC-logs-2018-07-20-11111111111111-1032057545.tar format.
- /home/nutanix/foundation/isos/ - old ISOs.
- /home/nutanix/foundation/tmp/ - temporary files that can be deleted.
If the above steps do not resolve the issue or if the issue matches one of the scenarios presented earlier in this article, follow the solution steps outlined below.
Prism Central VM root system partition (/) or CMSP partition (/dev/sdXX):
Consider engaging Nutanix Support. Gather the output of the commands below and attach it to the support case:
nutanix@pcvm$ allssh df -h
nutanix@pcvm$ sudo du -h --max-depth=1 / 2>/dev/null
nutanix@pcvm$ ncc health_checks system_checks pcvm_disk_usage_check
Scenario 1
Click here to display detailed information in this scenario:
The issue has been addressed and fixed in the releases: pc.2022.1, pc.2022.4, pc.2021.9.0.5 and later.
As a workaround, remove the directories that do not have registered clusters with the corresponding version.
- Find the AOS versions of all the registered PEs. You can do so from Prism Central > Hardware > Clusters > AOS version column.
- List the PE apache console directories at /home/apache/www/console/
nutanix@pcvm$ sudo ls -lrth /home/apache/www/console/el7.3-release-*
- If any PE apache console directories that do not correspond to registered PE versions are present, they should be safe to clean up.
If you require further assistance with the cleanup, consider engaging Nutanix Support. Gather the output of the commands below and attach it to the support case:
nutanix@pcvm$ ncli cluster info
nutanix@pcvm$ allssh df -h
nutanix@pcvm$ sudo du -h --max-depth=1 /home/apache/www 2>/dev/null
nutanix@pcvm$ cat ~/config/upgrade.history
nutanix@pcvm$ ls -lrth /home/apache/www/console/el7.3-release-*
nutanix@pcvm$ du -sh /home/apache/www/console/el7.3-release-*
Removing older sysstats logs
If you have checked in all the usual places but still need to clean up space, you can get the customer’s permission to remove older sysstats logs. Double-check that there are no open support cases with pending RCAs before proceeding, as this data may not yet have been collected.
nutanix@PCVM:~$ sudo du -h -d 1 /home/nutanix/data/logs | sort -h
4.0K /home/nutanix/data/logs/ecr
28K /home/nutanix/data/logs/work
5.5M /home/nutanix/data/logs/kafka
127M /home/nutanix/data/logs/cassandra
162M /home/nutanix/data/logs/data_providers
368M /home/nutanix/data/logs/ikat_access_logs
4.2G /home/nutanix/data/logs/sysstats
11G /home/nutanix/data/logs
nutanix@PCVM:~$
You can use the following command to remove gzipped sysstats logs older than a certain date. In the example below, you will remove sysstats logs from all PCVMs that are older than 3 days (or 4320 minutes).
nutanix@PCVM:~$ allssh "find ~/data/logs/sysstats -name '*.gz' -mmin +4320 -type f -exec rm '{}' +"
Scenario 2
If your Prism Central instance matches this scenario, refer to KB-12707 Scenario #2 and open a case with Nutanix Support for assistance with in recovering from the issue.
Scenario 3
Click here to display detailed information in this scenario:
The increase in Prism Central VM, upgrade disk usage, is due to multiple Prism Central Installer files. Reduce Prism Central VM upgrade partition usage by removing the Prism Central Installer files that are not needed.
This partition /home/nutanix/upgrade is designed to hold upgrade-related files. So, any other file within this directory can be deleted.
The partition usage as listed in
df -h:
/dev/sde 30G 26G 4.1G 87%/home/nutanix/upgrade
To check the contents of the disk:
allssh "ls -latr /home/nutanix/upgrade/"
To remove the unwanted files, use the below command:
rm -f /home/nutanix/upgrade/<file_to_be_removed>
Please Note: If you do accidentally delete the Prism Central Upgrade in this folder, log into Prism Central via a Web Browser, go to Prism Central Settings --> Upgrade Prism Central --> Click the "X" next to the software upgrade and Re-Download the package.
Scenario 4
Nutanix is aware of the issue. The fix for this issue will be made available in a future PC release. For a workaround, engage Nutanix Support.
Scenario 5
If you see catalina.out log file is consuming a lot of space, use the following command to restart the prism service on the PCVM.
Click here to display detailed information in this scenario:
nutanix@PCVM:~$ genesis stop prism; cluster start
In some rare cases, the catalina.out file will not automatically clear up space after the prism leader rolls over. To fix this, manually zero-out the catalina.out log file after the prism leader rolls over:
nutanix@PCVM:~$ echo "" > ~/data/logs/catalina.out
For single instance PCVMs, perform this change while the prism service is stopped and then run a cluster start.
Scenario 6
Follow KB-6082 to clear the inode usage.
Related Articles