Launching soon: The world's first vendor agnostic bug scrubLearn more & join waitlist

VMware - Defect ID: 96427

Tanzu Mission Control Self-Managed (TMC-SM) won't start due to storage issues in underlying StatefulSets.

Last updated on 2/8/2024

Overall
0N/A
Severity
0N/A
Community
0N/A
Lifecycle
0N/A

Vendor details

  • No defect details.

Symptoms

Tanzu Mission Control Self-Managed (TMC-SM) won't start.TMC-SM was deployed using VMware Cloud Director Extension for VMware Tanzu Mission Control to a VMware Cloud Director Container Service Extension Kubernetes cluster.TMC-SM fails to start due to storage issues in underlying StatefulSets.PVCs run out of storage space.Checking the Pod logs you see errors relating to disk usage. kubectl logs <POD_NAME>ts=2024-01-24T18:51:43.125Z caller=main.go:1166 level=error err="opening storage failed: open <device>: no space left on device" Inspection of the Pod shows a full disk. $ kubectl exec <POD_NAME> -- df -h Filesystem Size Used Avail Use% Mounted onoverlay 19G 13G 5.3G 71% /tmpfs 64M 0 64M 0% /devtmpfs 16G 0 16G 0% /sys/fs/cgroup/dev/sda4 19G 13G 5.3G 71% /tmp/dev/sdb 10G 10G 0M 100% /mntshm 64M 0 64M 0% /dev/shmtmpfs 32G 12K 32G 1% /run/secrets/kubernetes.io/serviceaccounttmpfs 16G 0 16G 0% /proc/acpitmpfs 16G 0 16G 0% /proc/scsitmpfs 16G 0 16G 0% /sys/firmware

Purpose

This KB is designed to help identify which Pod is experiencing the issue.Once identified, use the steps in the following KB to resolve the issue:https://kb.vmware.com/s/article/96425

Resolution

There are 4 main Pods which can experience the issue, each of which have a different method by which you can review the current space usage. AlertManagerKafkaPostgresPrometheus AlertManager $ kubectl exec alertmanager-tmc-local-monitoring-tmc-local-0 -c alertmanager -- df -h /dataFilesystem Size Used Avail Use% Mounted on/dev/sdd 2.0G 2.0G 0M 100% /dataProceed to run KB 96425 replacing any reference to <PACKAGE_INSTALL_NAME> with tmc-local-monitoring. Kafka $ kubectl exec kafka-controller-0 -c kafka -- df -h /bitnami/kafkaFilesystem Size Used Avail Use% Mounted on/dev/sdc 5G 5G 0M 100% /bitnami/kafkaProceed to run KB 96425 replacing any reference to <PACKAGE_INSTALL_NAME> with kafka. Postgres $ kubectl exec postgres-postgresql-0 -c postgresql -- df -h /bitnami/postgresqlFilesystem Size Used Avail Use% Mounted on/dev/sdb 7.8G 7.8G 0M 100% /bitnami/postgresqlProceed to run KB 96425 replacing any reference to <PACKAGE_INSTALL_NAME> with postgres. Prometheus $ kubectl get podsprometheus-server-tmc-local-monitoring-tmc-local-0 1/2 CrashLoopBackOff 1327 (3m52s ago) 13d$ kubectl logs prometheus-server-tmc-local-monitoring-tmc-local-0 -c prometheusts=2024-01-24T18:51:43.125Z caller=main.go:1166 level=error err="opening storage failed: open /prometheus/wal/00000582: no space left on device"Proceed to run KB 96425 replacing any reference to <PACKAGE_INSTALL_NAME> with tmc-local-monitoring.

  • No bugs this month

Ready to prevent the next vendor outage?

BugZero | VMware BugID 96427 - Tanzu Mission Control Self-Managed (TMC-SM) won't ...