Do we have a process to enable alerts when one of the service is shutdown. Critical System Events Notifications doesn’t notify anything about the services. There was one on the service down in the primary instance and there is no alert from RSA which caused production outage. Looking for solution which will notify when a services is shutdown. Is there something that is captured in syslogs… if yes, then we try to enable notifications in SPLUNK through logs with that event.
Hi,
Unfortunately that is one thing it lacks, direct notification or indicator of a specific RSA service stopping or not started. But we do send syslog messages that are relevant to services stopping.
These are the services on a primary with working replica:
./rsaserv status
RSA Database Server [RUNNING]
RSA Administration Server with Operations Console [RUNNING]
RSA RADIUS Server Operations Console [RUNNING]
RSA Runtime Server [RUNNING]
RSA RADIUS Server [RUNNING]
RSA Console Server [RUNNING]
RSA Replication (Primary) [RUNNING]
The system will syslog events which are symptoms of one of these stopping, as in an ordinary shutdown.
some examples of messages when one or more services are shutting down:
UDPserver,warn (auth port 5500/udp is going down)
TCPServer,warn (auth port 5500/tcp is going down)
MessageProcessorImpl, warn
configureSNMP.sh disable
am.MessageKeyManager, warn
am.AMAdjudicatorManager, info
Probably the easiest and best way to monitor the AM servers (in my humble opinion) would be snmp gets on the replication status. This might be the most important and probe-able thing. If replication state is [healthy], the rest of the server and also the replica(s) can largely be assumed to be up and running smoothly. These systems will not want to replicate if other services are having a bad day. Another way would be set up a 'radius client robot' to do logins with a special account whose userid has no other access to anything else in the company, and using fixed passcode to perform a radius authentication every [interval]. One example is radlogin4 which can alert via SMS or email if it cannot do an authentication.
A robot user who is in an external identity source, with fixed passcode, performing a radius login, tests three things: valid connection to the external identity store, if port 5500/udp is up and authenticating, and if radius engine is able to receive incoming auths and respond with access-accept.