Without warning or recent changes, some of the hosts and services is appearing offline from the NetWitness UI.
/var/log/messages on the affected hosts show repeating SSL session errors for the services as below.
Feb 1 23:25:56 archiver NwWorkbench: [Network] [audit] Could not establish a SSL session (error:14094416:SSL routines:ssl3_read_bytes:sslv3 alert certificate unknown) for x.x.x.x:53576
Feb 1 23:25:56 archiver NwArchiver: [Network] [audit] Could not establish a SSL session (error:14094416:SSL routines:ssl3_read_bytes:sslv3 alert certificate unknown) for x.x.x.x:58766
The issue continues even after restarting the affected services.
The issue can be caused by the expired /etc/pki/nw/node/node-cert.pem on the affected hosts. Even on environments where the service certificates are recently renewed per Sys Maintenance: Reissue Certificates, it is found that /etc/pki/nw/node/node-cert.pem is not updated with the renewed certificated on some of the hosts.
Please run the following command on the Admin server and the affected hosts to confirm the expiry date.
In order to resolve the issue, please download the attached zip file, extract and install on the Admin server and all target hosts, and follow the instruction below to renew the certificates on all hosts.
cert_rescue (nw-rescue-cert) For use only as a last resort to recover from expired service certificates. This tool should NOT be used to reissue certificates.Run cert-reissue prior to service certificate expiration to avoid the need for recovery. This tool should NOT be used if the system CA certificate(s) has expired (see root-ca-update instead). Prerequisites:
11.6+: rescue utility pre-installed (bundled with rsa-sa-tools package), no additional setup needed
11.5 (or older release that supports cert-reissue, namely, 11.3 and higher)): install hotfix package (rsa-nw-cert-rescue-hotfix) on all nodes
Run salt "*" test.ping on node-zero and ensure that all nodes are reachable.
NOTE: rsa-sa-tools obsoletes rsa-nw-cert-rescue-hotfix (i.e., hotfix package will be replaced on upgrade to 11.6+).
Automated (Centralized) Recovery Steps:
Rescue node-zero Run on node-zero: nw-rescue-cert exec-rescue-local -p <deployment-password>
Rescue remote node-x Run on node-zero: nw-rescue-cert exec-rescue-remote -p <deployment-password> NOTE: This will update ALL node-x, to update specific node(s) only: nw-rescue-cert exec-rescue-remote -p <deployment-password> --node-id <node-uuid-one> <node-uuid-two> ...
Reissue all certificates to restore functionality. Run on node-zero: cert-reissue --host-all --skip-health-checks
Confirm the updated expiry date. Run on node-x: keytool -printcert -file /etc/pki/nw/node/node-cert.pem | grep -Ei "owner|valid"
Reboot node-zero Run on node-zero: reboot
Review /var/log/netwitness/config-management/cert-rescue-cm.log if any issue is noticed while running the tool.