
OPERATIONAL DEFECT DATABASE
...

...
Summary:A deployed cluster starts showing multiple pod restarts and may cause the node to go into a not-ready state making the PFMP UI inaccessible. Scenario:A PowerFlex Management Platform (PFMP) instance deployed on a customer-supplied infrastructure that has slow storage and does not meet storage requirements for PFMP. The symptoms seen below can identify this issue. This may cause stability issues with the K8's cluster or Postgres database members (such as frequent state changes).In a fresh deployment, no issues may be observed until the PowerFlex Manager automation is triggered to deploy the PowerFlex cluster. Various pods may be restarted multiple times or go into a crash loop back-off (CLBO) state.Below are the symptoms that are observed in a production environment.Run the below command to see the events in the cluster: kubectl get events kubectl describe node Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal RegisteredNode 36m node-controller Node pfmp-mvm-cl1-02 event: Registered Node pfmp-mvm-cl1-02 in Controller Normal NodeNotReady 35m node-controller Node pfmp-mvm-cl1-02 status is now: NodeNotReady Normal Starting 33m kubelet Starting kubelet. Warning InvalidDiskCapacity 33m kubelet invalid capacity 0 on image filesystem Normal NodeHasSufficientMemory 33m (x2 over 33m) kubelet Node pfmp-mvm-cl1-02 status is now: NodeHasSufficientMemory Normal NodeHasNoDiskPressure 33m (x2 over 33m) kubelet Node pfmp-mvm-cl1-02 status is now: NodeHasNoDiskPressure Normal NodeHasSufficientPID 33m (x2 over 33m) kubelet Node pfmp-mvm-cl1-02 status is now: NodeHasSufficientPID Normal NodeNotReady 33m kubelet Node pfmp-mvm-cl1-02 status is now: NodeNotReady Normal NodeAllocatableEnforced 33m kubelet Updated Node Allocatable limit across pods Normal NodeReady 33m kubelet Node pfmp-mvm-cl1-02 status is now: NodeReady Below is the Sample output that is failing for Kubelet. Logs are found in the below path for each node: /var/lib/rancher/rke2/agent/logs/kubelet.log Aug 04 17:12:11 pfmp-mvm-cl1-02 rke2[31392]: time="2023-08-04T17:12:11+09:00" level=debug msg="Wrote ping" Aug 04 17:12:12 pfmp-mvm-cl1-02 rke2[31392]: E0804 17:12:12.654816 31392 leaderelection.go:367] Failed to update lock: etcdserver: request timed out Aug 04 17:12:13 pfmp-mvm-cl1-02 rke2[31392]: time="2023-08-04T17:12:13+09:00" level=debug msg="Wrote ping" Aug 04 17:12:14 pfmp-mvm-cl1-02 rke2[31392]: I0804 17:12:14.526253 31392 leaderelection.go:283] failed to renew lease kube-system/rke2: timed out waiting for the condition Aug 04 17:12:16 pfmp-mvm-cl1-02 rke2[31392]: time="2023-08-04T17:12:16+09:00" level=debug msg="Wrote ping" Aug 04 17:12:17 pfmp-mvm-cl1-02 systemd[1]: run-containerd-runc-k8s.io-1ee2f8a41e076afeb4f14eb53d7faea3b1b11c59ecc44aeed0c3ee1333b07a01-runc.DLNAIb.mount: Succeeded. Aug 04 17:12:23 pfmp-mvm-cl1-02 rke2[31392]: time="2023-08-04T17:12:23+09:00" level=debug msg="Wrote ping" Aug 04 17:12:26 pfmp-mvm-cl1-02 rke2[31392]: E0804 17:12:26.235602 31392 leaderelection.go:306] Failed to release lock: Operation cannot be fulfilled on configmaps "rke2": the object has been modified; please apply your changes to the latest version and try again Aug 04 17:12:26 pfmp-mvm-cl1-02 rke2[31392]: time="2023-08-04T17:12:26+09:00" level=fatal msg="leaderelection lost for rke2 etcd loggings:Run the below command to find the pod for each etcd and extract the logs: # kubectl get pods -n kube-system |grep etcd etcd-node1 1/1 Running 1 92d etcd-node2 1/1 Running 1 92d etcd-node3 1/1 Running 1 92d # kubectl logs --follow -n kube-system etcd-node1 >> etcd.txt Logging in the etcd logs: 2023-08-04T17:12:12.223766355+09:00 stderr F {"level":"warn","ts":"2023-08-04T08:12:12.223Z","caller":"etcdserver/util.go:166","msg":"apply request took too long","took":"16.840314464s","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/operator.tigera.io/installations/\" range_end:\"/registry/operator.tigera.io/installations0\" count_only:true ","response":"","error":"etcdserver: request timed out"} Alternatively, the same logs can be found for each node in the below path as well: /var/log/pods/kube-system_etcd-_e18aa5e5b83a5a3c56d78e4054612394/etcd Impact:This may lead to PFMP cluster node stability issues getting the pods to restart multiple times causing the UI to be inaccessible.
The etcd response time is sensitive to slow storage response. If the MVMs' underlying storage does not meet requirements, slow performance issues may be seen within the etcd environment. Storage response times exceeding 1 s can also lead to various Kubelet process crashes. For example, storage consisted of hybrid drives (both HDD and SSD) can lead to performance issues with the etcd.The below command can be run to verify the etcd performance: for x in $(kubectl get pods -n kube-system |grep etcd |awk '{print $1}') ; do echo "------------------------"; echo $x; echo;kubectl exec -it -n kube-system $x -- etcdctl check perf --cacert="/var/lib/rancher/rke2/server/tls/etcd/server-ca.crt" --cert="/var/lib/rancher/rke2/server/tls/etcd/server-client.crt" --key="/var/lib/rancher/rke2/server/tls/etcd/server-client.key"; echo "------------------------"; echo; sleep 5; done Below is the Sample output for etcd performance failure: for x in $(kubectl get pods -n kube-system |grep etcd |awk '{print $1}') ; do echo "------------------------"; echo $x; echo;kubectl exec -it -n kube-system $x -- etcdctl check perf --cacert="/var/lib/rancher/rke2/server/tls/etcd/server-ca.crt" --cert="/var/lib/rancher/rke2/server/tls/etcd/server-client.crt" --key="/var/lib/rancher/rke2/server/tls/etcd/server-client.key"; echo "------------------------"; echo; sleep 5; done ------------------------ etcd-sio-car-pfmp-mvm-01 60 / 60 Boooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00% 1m0s PASS: Throughput is 144 writes/s Slowest request took too long: 1.282354s <<<<<<<<<<<<<<<<<<<<<<< Stddev too high: 0.151896s FAIL command terminated with exit code 1 ------------------------ ------------------------ etcd-sio-car-pfmp-mvm-02 60 / 60 Boooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00% 1m0s PASS: Throughput is 150 writes/s Slowest request took too long: 0.517011s PASS: Stddev is 0.064465s FAIL command terminated with exit code 1 ------------------------ ------------------------ etcd-sio-car-pfmp-mvm-03 60 / 60 Boooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo! 100.00% 1m0s FAIL: Throughput is 138 writes/s Slowest request took too long: 2.505517s <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< Stddev too high: 0.207719s FAIL command terminated with exit code 1 The etcd performance test fails if the following parameters are not met: Measure Limit Throughput > 140 writesSlowest Request < 500 msStandard Deviation < 100 ms
The issue is not in the PFMP version but in how the cluster VMs are deployed.The MVMs underlying storage should reside local to the node where these are deployed, and the drives should be at least SSD.Also, VMware snapshots, while useful, can impact guest performance. To minimize this effect, use snapshots only temporarily and promptly delete them when no longer needed. If the underlying storage for the MVMs is provided by PowerFlex, the system can be fine-tuned to ensure optimal performance for PFMP. 1). From the Primary MDM, adjust the SDS thread count to 12: scli --set_performance_parameters --all_sds --tech --sds_number_os_threads 12 2). From the Primary MDM, adjust the SDC thread count to 10: scli --set_performance_parameters --all_sdc --tech --sdc_number_network_os_threads 10 3). From the Primary MDM, adjust the IO flow control: scli --set_performance_parameters --all_sdc --tech --sdc_max_inflight_requests 300 scli --set_performance_parameters --all_sdc --tech --sdc_max_inflight_data 30 4). From the Primary MDM, disable queue partition: scli --set_inflight_requests_flow_control --protection_domain_name domain1 --disable_flow_control scli --set_inflight_bandwidth_flow_control --protection_domain_name domain1 --disable_flow_control 5). On all of the SDSes that are contributing storage devices to the PowerFlex system that is hosting the MVMs, ensure that the disk scheduler is set to none for all the SSD devices. Example output of how it should look: cat /sys/block/sd*/queue/scheduler [none] mq-deadline kyber bfq [none] mq-deadline kyber bfq [none] mq-deadline kyber bfq ... Impacted VersionsPFMP 4.x Fixed In VersionN/A - Not a PowerFlex issue.
Click on a version to see all relevant bugs
Dell Integration
Learn more about where this data comes from
Bug Scrub Advisor
Streamline upgrades with automated vendor bug scrubs
BugZero Enterprise
Wish you caught this bug sooner? Get proactive today.