...
The vRealize Operations internal certificate has expired. This could be manifested by being unable to log into the Admin UI. The cluster is Offline and you are unable to bring it Online, you see a message similar to: "Data Retriever is not initialized yet. Please wait.". The vRealize Operations internal certificate will expire soon. Note: This article is not applicable or required for VMware Aria Operations 8.12 or later.
The internal certificate in vRealize Operations is generated upon initial deployment.Currently, upgrading to later versions of vRealize Operations does not upgrade the internal certificate.Note: It is not possible or supported to replace the internal certificate with a custom certificate.
The following instructions are for vRealize Operations 6.3 - 8.10.x.There is no recovery option for vRealize Operations Manager 6.2.x and earlier.Upgrading to vRealize Operations 8.6 also upgrades the internal certificate during the upgrade process, except for setups with Cloud Proxies connected. The VMware vRealize Operations Certificate Renewal PAK file must be applied on vRealize Operations 8.6 only if the internal certificate has expired and if a Cloud Proxy node was connected to vRealize Operations before upgrade to 8.6.If you are on vRealize Operations 8.5 or lower with Cloud Proxies connected, you must upgrade to vRealize Operations 8.6 before upgrading to vRealize Operations 8.10 or later.If you are on vRealize Operations 8.5 or lower, and you do not have Cloud Proxies connected, you can upgrade directly to vRealize Operations 8.10 or later, within the bounds of your Upgrade Path.After upgrading the vRealize Operations certificate by PAK file, any Cloud Proxies certificates need to be upgraded manually (vRealize Operations 8.4 or later).This must be done by extracting the root certificate from vRealize Operations by using any browser and upload to Cloud Proxy by following Add CA certs while deploying a cloud proxy in vRealize Operations 8.4 or later.Before following the resolution below, it is vital to snapshot the vRealize Operations nodes by following How to take a Snapshot of vRealize Operations.Note: If the certificate has already expired, and the Admin UI is not accessible, steps 1, 2, 10, and 11 can be skipped on the above mentioned KB.Note: Upgrading vRealize Operations is not a replacement for this article. The steps below still must be followed.
Identify if Certificate Renewal is Required First, validate is the certificate renewal is required. If the certificate is not yet expired, the certificate can be checked from a Web Browser. See the steps below for the most common web browsers: Mozilla FirefoxGoogle ChromeMicrosoft Edge Alternatively, if the certificate is expired or the UI is inaccessible, the certificate must be checked from the Primary node's command line. Command Line Note: Starting in vRealize Operations 8.0, a pop up is displayed in the UI, warning when certificate expiration will occur. Mozilla Firefox Open https://Primary_Node_IP_or_FQDN:6061. Notes: Replace Primary_Node_IP_or_FQDN with the actual IP or FQDN of the vRealize Operations Primary node.The page displays a Warning: Potential Security Risk Ahead or Secure Connection Failed message; this is expected.The Gemfire service must be running for a certificate to be presented.No web page is expected to load, this is normal behavior; continue with the steps. Click on Advanced and then on View Certificate.Check the certificate end date under Period of Validity. Google Chrome Open https://Primary_Node_IP_or_FQDN:6061. Notes: Replace Primary_Node_IP_or_FQDN with the actual IP or FQDN of the vRealize Operations Primary node.The page displays a Your connection is not private or This site can't provide a secure connection message; this is expected.The Gemfire service must be running for a certificate to be presented.No web page is expected to load, this is normal behavior; continue with the steps. Click on Not secure in address bar then click on Certificate (Invalid).Check the certificate end date under Valid From. Microsoft Edge Open https://Primary_Node_IP_or_FQDN:6061. Notes: Replace Primary_Node_IP_or_FQDN with the actual IP or FQDN of the vRealize Operations Primary node.The page displays a This site is not secure message; this is expected.The Gemfire service must be running for a certificate to be presented.No web page is expected to load, this is normal behavior; continue with the steps. Click on Certificate error in address bar then click on View certificate.Check the certificate end date under Valid To. Command Line If the certificate is expired or the UI is inaccessible, the certificate must be checked from the Primary node's command line. Log into the Primary node as root via SSH or Console.Run the following command: /bin/grep -E --color=always -B1 'java.security.cert.CertPathValidatorException: validity check failed|java.security.cert.CertificateExpiredException' $ALIVE_BASE/user/log/*.log | /usr/bin/tail -20Note: If step 2 returns nothing, certificate renewal is not yet required.If step 2 returned output containing validity check failed, certificate renewal is required immediately. Certificate Renewal To renew the certificate, install the applicable pak file to generate a new internal certificate.Depending on if the certificate has expired or not, choose the following applicable steps to install the PAK file. Internal Certificate Not ExpiredInternal Certificate Expired Internal Certificate Not Expired If the vRealize Operations internal certificate has not yet expired, install the vRealize Operations Certificate Renewal PAK file while the vRealize Operations cluster is in an Offline state.Note: Ensure all of the following steps are completed on all nodes in the vRealize Operations cluster unless noted otherwise. Snapshot the vRealize Operations nodes by following How to take a Snapshot of vRealize Operations.Download the Certificate Renewal PAK file for your version of vRealize Operations from the VMware Patch Portal. Select vROps Certificate Renewal as the Product.�Select the vRealize Operations product version. Notes: For versions 6.3 to 8.1.1, select version 8.0.0. This pak is compatible with vRealize Operations versions 6.3 to 8.1.1.For versions 8.4.x to 8.10.x, select version 8.4.0. This pak is compatible with vRealize Operations versions 6.4 to 8.10.x. Log into all nodes in the vRealize Operations cluster as root via SSH or Console.Log into the vRealize Operations Admin UI as the local admin user.Click Take Offline under Cluster Status. Note: Wait for Cluster Status to show as Offline. Click Software Update in the left panel.Click Install a Software Update in the main panel.Follow the steps in the wizard to locate and install your PAK file.Install the certificate renewal PAK file.Wait for the software update to complete. When it does, the Administrator interface logs you out. Note: If the cluster does not report the installation as Completed after a long time, compete the 4 steps listed just after step 14. Log into the vRealize Operations Admin UI as the local admin user.Clear the browser caches and if the browser page does not refresh automatically, refresh the page.Click Bring Online under Cluster Status. Note: The cluster status changes to Going Online. When the cluster status changes to Online, the upgrade is complete. Run the following commands on all nodes in the vRealize Operations cluster: chown admin:admin -R /storage/vcops/user/conf/ssl/ /storage/vcops/user/conf/ssl_bak/ /storage/db/casa/webapp/hsqldb/chown -h root:root /storage/vcops/user/conf/ssl/web_cert.pem /storage/vcops/user/conf/ssl/web_chain.pem /storage/vcops/user/conf/ssl/web_key.pemchmod guo+r -R /storage/vcops/user/conf/ssl/chmod 444 /storage/vcops/user/conf/ssl/cacert.pem /storage/vcops/user/conf/ssl/slice_*_cert.pemchmod 400 /storage/vcops/user/conf/ssl/cakey.pem /storage/vcops/user/conf/ssl/slice_*_cert.pfx /storage/vcops/user/conf/ssl/slice_*_key.pemchmod 640 /storage/vcops/user/conf/ssl/tcserver.keystore Note: For version 8.4 and later, also run the following commands on the Primary node and Primary Replica node (if present) and all data nodes: chown postgres:root /storage/vcops/user/conf/ssl/postgres_vcopsrepl_*chmod 600 /storage/vcops/user/conf/ssl/postgres_vcops_key.pk8 /storage/vcops/user/conf/ssl/postgres_vcopsrepl_key.pemchmod 640 /storage/vcops/user/conf/ssl/postgres_vcops_cert.pem /storage/vcops/user/conf/ssl/postgres_vcopsrepl_cert.pem If the admin UI after a long time does not report that installation of the Certificate Renewal PAK file as completed, complete the following steps. Log into the Primary node as root via SSH or Console.Run the following command to update the PAK installation status: sed -i -e 's/\"initialization_state\"\:\"INITIALIZING\"/\"initialization_state\"\:\"NONE\"/g' /data/db/casa/webapp/hsqldb/casa.db.script Repeat steps 1-2 on the Primary Replica node (if present).Run the following command on the Primary and Primary Replica (if present) nodes to restart the CaSA service: service vmware-casa restart Internal Certificate Expired If the vRealize Operations internal certificate has already expired, the vRealize Operations Certificate Renewal PAK file will need to be installed manually. Complete the following steps on the vRealize Operations cluster while the cluster is in an Offline state. If you are unable to take the cluster offline through the Admin UI, contact VMware Support.Note: Ensure all of the following steps are completed on all nodes in the vRealize Operations cluster unless noted otherwise. Snapshot the vRealize Operations nodes by following How to take a Snapshot of vRealize Operations.Download the Certificate Renewal PAK file for your version of vRealize Operations from the VMware Patch Portal. Select vROps Certificate Renewal as the Product.�Select the vRealize Operations product version. Notes: For versions 6.3 to 8.1.1, select version 8.0.0. This pak is compatible with vRealize Operations versions 6.3 to 8.1.1.For versions 8.4.x to 8.10.x, select version 8.4.0. This pak is compatible with vRealize Operations versions 8.4 to 8.10.x. Copy the vRealize Operations Certificate Renewal PAK file to the /tmp/ directory on all nodes in the vRealize Operations cluster using an SCP utility.Log into all nodes in the vRealize Operations cluster as root via SSH or Console.Run the following command on all nodes in the vRealize Operations cluster to make the necessary directories: mkdir -p /data/db/pakRepoLocal/vRealize_Operations_Manager_Enterprise_Certificate_Renewal/extracted Unzip the vRealize Operations Certificate Renewal PAK file by running the following command on all nodes in the vRealize Operations cluster: unzip /tmp/vRealize_Operations_Manager_Enterprise_Certificate_Renewal-build.pak -d /data/db/pakRepoLocal/vRealize_Operations_Manager_Enterprise_Certificate_Renewal/extractedNote: Replace build with the build number of the downloaded vRealize Operations Certificate Renewal PAK file.Example: unzip /tmp/vRealize_Operations_Manager_Enterprise_Certificate_Renewal-8.0.0.15217416.pak -d /data/db/pakRepoLocal/vRealize_Operations_Manager_Enterprise_Certificate_Renewal/extracted Stop all services by running the following commands: service vmware-vcops-watchdog stopservice vmware-vcops stopNote: For versions 8.3 and later, ensure that all services have been stopped by running the vrops-status command.If there is a running service please kill it manually.Example: (vpostgres) is running (3557)Run this command to terminate the process: kill -9 3557 Check if the is_admin property is set only for the Primary node in casa.db.srcipt. On all nodes (including Remote Collectors and Witness) run the following command to verify the status of the is_admin property: sed -nre "/clusterMembership/ s/^[^']+'([^']+)','([^']+)'.*/\2/p" /storage/db/casa/webapp/hsqldb/casa.db.script | python -m json.tool In the output ""is_admin_node": true" should only be set when the "slice_name": "MASTER". If true is set for other nodes, complete the following on all nodes (including Remote Collectors and Witness): Run service vmware-casa stopEdit /storage/db/casa/webapp/hsqldb/casa.db.script and ensure "is_admin_node" is set to true for the Primary node, and false for all other nodes.Run service vmware-casa start The following command needs to be run in a particular order. Follow each sub-step carefully. Command: $VMWARE_PYTHON_BIN /data/db/pakRepoLocal/vRealize_Operations_Manager_Enterprise_Certificate_Renewal/extracted/updateCoordinator.py EXPIRED First, run the command on all Remote Collector nodes (if present) in the cluster, and wait for the task to complete. Continue to step 8.2.Next, run the command on all Data nodes, the Witness node (if present), and the Primary Replica node (if present) in the cluster; do not wait for each node to complete, just start the command on all nodes. Once Waiting for certificate generation to complete appears on the last node, wait roughly 60 seconds, and continue to step 8.3.Finally, run the command on the Primary node. The expected behavior is for the command to finish, then shortly afterwards the pending tasks on the Data nodes and Primary Replica node (if present) will complete. Note: To ensure that the command completes successfully check for the existence of the /var/vmware/_cert_generation_completed file on the Primary node. Change newly generated certificates permissions on all nodes in the vRealize Operations cluster by running the following commands: chown admin:admin -R /storage/vcops/user/conf/ssl/ /storage/vcops/user/conf/ssl_bak/ /storage/db/casa/webapp/hsqldb/chown -h root:root /storage/vcops/user/conf/ssl/web_cert.pem /storage/vcops/user/conf/ssl/web_chain.pem /storage/vcops/user/conf/ssl/web_key.pemchmod guo+r -R /storage/vcops/user/conf/ssl/chmod 444 /storage/vcops/user/conf/ssl/cacert.pem /storage/vcops/user/conf/ssl/slice_*_cert.pemchmod 400 /storage/vcops/user/conf/ssl/cakey.pem /storage/vcops/user/conf/ssl/slice_*_cert.pfx /storage/vcops/user/conf/ssl/slice_*_key.pemchmod 640 /storage/vcops/user/conf/ssl/tcserver.keystore Note: For version 8.4 and later, also run the following commands on the Primary node and Primary Replica node (if present) and all data nodes: chown postgres:root /storage/vcops/user/conf/ssl/postgres_vcopsrepl_*chmod 600 /storage/vcops/user/conf/ssl/postgres_vcops_key.pk8 /storage/vcops/user/conf/ssl/postgres_vcopsrepl_key.pemchmod 640 /storage/vcops/user/conf/ssl/postgres_vcops_cert.pem /storage/vcops/user/conf/ssl/postgres_vcopsrepl_cert.pem Log into the vRealize Operations Admin UI as the local admin user.Click Bring Offline under Cluster Status.If the cluster fails to go offline, click Force Offline under Cluster Status. Note: Wait for the Cluster Status to show as Online. Click Bring Online under Cluster Status. Note: Wait for the Cluster Status to show as Online.
Note: After the certificate renewal, vRealize Operations retains the previous key in it's truststore once the new certificate is generated. vRealize Operations will use both the old and new certificate for validation.Currently there is no revocation mechanism.