...
isi_cpool_d process shows continuous high CPU utilization on the cluster. Isilon-1# top -n 10 PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 87857 root 124 20 0 595M 173M nanslp 13 1722.5 857.62% isi_cpool_d 3455 root 29 98 r150 397M 86M sigwai 10 4216.2 62.55% nfs 3313 root 40 98 r150 1018M 683M sigwai 14 7402.9 47.71% lwio 94259 root 13 52 0 566M 491M usem 18 374.1H 32.57% isi_celog_monitor 18378 root 5 20 0 102M 53M uwait 3 49:57 24.56% isi_job_d 34552 root 1 52 0 37M 15M adv 22 112.6H 20.51% isi_migr_sched 3144 root 13 20 0 52M 13M select 8 2009.5 15.33% isi_audit_d 98432 root 1 52 0 105M 66M kqread 26 417:47 14.55% isi_celog_analysis 3213 root 26 52 0 96M 28M uwait 10 1109.2 12.50% isi_avscan_d 51167 root 5 20 0 93M 42M uwait 21 74:37 10.40% isi_job_d ... .. Multiple CloudPools jobs may be running on the cluster, but even when all jobs are paused the isi_cpool_d utilization remains high. Isilon-1# isi cloud jobs list ID Description Effective State Type --------------------------------------------------------------------------------------- 1 Write updated data to the cloud paused cache-writeback 2 Expire CloudPools cache paused cache-invalidation 4 Clean up unreferenced data in the cloud paused cloud-garbage-collection 5 Write updated snapshot data to the cloud paused snapshot-writeback 6 Update SmartLink file formats paused smartlink-upgrade 7 Add data to CloudPools cache paused cache-pre-populate 959 paused archive 960 paused archive 961 paused archive 962 paused archive 964 paused archive 965 paused archive 966 paused archive 967 paused archive 968 paused archive --------------------------------------------------------------------------------------- Isilon-1# top -n 5 PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND 87857 root 124 20 0 588M 180M nanslp 4 1723.5 805.81% isi_cpool_d 3455 root 28 98 r150 397M 87M sigwai 10 4216.3 69.34% nfs 18378 root 6 20 0 122M 72M uwait 9 53:18 68.36% isi_job_d 3313 root 49 98 r150 1019M 684M sigwai 14 7403.0 66.16% lwio 51167 root 6 20 0 94M 42M uwait 26 76:02 22.36% isi_job_d ...
Certain operations such as cache-writeback and cache-invalidation occur in the background and do not directly correlate to any running CloudPools job. Pausing CloudPools jobs does not stop these operations from running. These threads continue to run and cause high CPU utilization. To confirm this, pause the cache-writeback and cache-invalidation operations while monitoring CPU utilization. isi_cpool_d CPU utilization should drop off quickly once paused. Isi_cpool_d CPU utilization then climbs once operations are resumed. To pause CloudPools Operations: # isi cloud jobs pause cache-writeback # isi cloud jobs pause cache-invalidation To resume CloudPools Operations: # isi cloud jobs resume cache-invalidation # isi cloud jobs resume cache-writeback
It is not advisable to leave the cache-writeback and cache-invalidation operations paused for a prolonged period of time. Various incomplete tasks and operations accumulate and magnify the problem. High CPU utilization caused by writeback or cache invalidation may indicate that a lot of caching has occurred. Usually because a large amount of data is being archived and inline recalled. This can be the result of poorly written archiving criteria in the File Pool Policies. Archiving being done without regard for access time can result in excessive caching of active files. This is an example of a poorly written File Pool Policy that archives data to an ECS CloudPools. Observe that any data within the designated paths is immediately archived to the CloudPools: -------------------------------------------------------------------------------- Name: Bad ECS Cloud Policy Description: Tier to ECS CloudPools State: OK CloudPools Details: Apply Order: 3 File Matching Pattern: Path == APPS/SeaShoreVideo (begins with) OR Path == APPS/OceanArchive (begins with) Set Requested Protection: - Data Access Pattern: - Enable Coalescer: - Enable Packing: - Data Storage Target: - Data SSD Strategy: - Snapshot Storage Target: - Snapshot SSD Strategy: - Cloud Pool: EMC ECS Pool Cloud Compression Enabled: Yes Cloud Encryption Enabled: No Cloud Data Retention: 1W Cloud Incremental Backup Retention: 5Y Cloud Full Backup Retention: 5Y Cloud Accessibility: cached Cloud Read Ahead: partial Cloud Cache Expiration: 1D Cloud Writeback Frequency: 9H ID: Good ECS Cloud Policy -------------------------------------------------------------------------------- This is an example of a properly written File Pool Policy that accommodates active and recently accessed files. Observe, this policy contains Access Time criteria so only data that has not been accessed after 5 weeks and 5 days is archived to the CloudPools. -------------------------------------------------------------------------------- Name: Good ECS Cloud Policy Description: Tier to ECS CloudPools State: OK CloudPools Details: Apply Order: 3 File Matching Pattern: Accessed Time > 5W5D AND Path == APPS/SeaShoreVideo (begins with) OR Accessed Time > 5W5D AND Path == APPS/OceanArchive (begins with) Set Requested Protection: - Data Access Pattern: - Enable Coalescer: - Enable Packing: - Data Storage Target: - Data SSD Strategy: - Snapshot Storage Target: - Snapshot SSD Strategy: - Cloud Pool: EMC ECS Pool Cloud Compression Enabled: Yes Cloud Encryption Enabled: No Cloud Data Retention: 1W Cloud Incremental Backup Retention: 5Y Cloud Full Backup Retention: 5Y Cloud Accessibility: cached Cloud Read Ahead: partial Cloud Cache Expiration: 1D Cloud Writeback Frequency: 9H ID: Bad ECS Cloud Policy -------------------------------------------------------------------------------- Other causes of high isi_cpool_d CPU utilization may vary depending on the Cluster Configuration, Settings, and Code Level. Reach out to Dell Technical Support if assistance is needed.