- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Replica Decommission SDCONF.REC Question
We will be decommissioning one of our replica servers and have many agents. Do I need to generate a new sdconf.rec file so the agents all update and do not attempt to authenticate further?
Today, I decommissioned a replica to rebuild it with a new IP and it broke our webmail authentication.
An error page was displayed as shown below until we applied the new sdconf.rec file.
I was told the sdconf file should update automatically but this confuses me.
Thanks
- Tags:
- AM
- Auth Manager
- Authentication Manager
- Community Thread
- decommission
- Discussion
- Forum Thread
- replica
- replica decommission
- RSA Authentication Manager
- RSA SecurID
- RSA SecurID Access
- sdconf.rec
- SecurID
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Basic rundown:
If all RSA servers are operational and replicating normally
and
If all RSA agent can reach any RSA server on the network at all times
Then: the sdconf.rec tells an agent where the primary and replicas are located, and when the agent needs to do an auth, it will contact one or the other RSA server and do that auth. Then it records how fast that RSA server responded. Next time it needs to do an auth, it will randomly choose an RSA server, and record the time. Over time, the agent will begin to prefer a faster responding RSA sevrer, but will still try them all on occasion. If one does not respond, it just goes and tries another one in the list, until all RSA servers do not respond, then it will fail.
So, dropping a replica should not cause a web agent to fail completely, maybe one auth might fail, but if things are set up correctly, it will just try the primary and do auths. You do not need new sdconf.rec, the agent might try the stale replica once in a while, but it will prefer to try the primary most of the time. When it tries the missing replica once in a while it will wait for a few ms for an answer, then just go try the primary, and the user will get authenticated. But none of this should be an authentication outage.
So, in this state, you can do network checks or look in auth activity logs to see if that web agent ever did an authentication to the primary, or was it only able to authenticate to the replica for some reason...in other words something not quite right with the 'can reach any RSA server at all times' idea.
But if you want to straighten it all out: yes, replace sdconf.rec with a current one that has the current RSA servers in it, and find and delete any file named sdstatus.* on the agent. The sdstatus file, is the file which does the load balancing and knows which RSA server responds faster or slower, so when you have issues like this, deleting the sdstatus cleans up any knowledge of 'stale rsa servers' and forces the agent to look at the sdconf.rec again, and when it contacts the primary, the primary will tell the agent 'this is my current list of replicas'.
Summary: Normally you can add and delete replicas, and the agents figure it out automatically, nothing needs to be done on the agent side..... But when problems occur, you can deploy new sdconf.rec and delete sdstatus.* files to force the agent to only use newest information.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Basic rundown:
If all RSA servers are operational and replicating normally
and
If all RSA agent can reach any RSA server on the network at all times
Then: the sdconf.rec tells an agent where the primary and replicas are located, and when the agent needs to do an auth, it will contact one or the other RSA server and do that auth. Then it records how fast that RSA server responded. Next time it needs to do an auth, it will randomly choose an RSA server, and record the time. Over time, the agent will begin to prefer a faster responding RSA sevrer, but will still try them all on occasion. If one does not respond, it just goes and tries another one in the list, until all RSA servers do not respond, then it will fail.
So, dropping a replica should not cause a web agent to fail completely, maybe one auth might fail, but if things are set up correctly, it will just try the primary and do auths. You do not need new sdconf.rec, the agent might try the stale replica once in a while, but it will prefer to try the primary most of the time. When it tries the missing replica once in a while it will wait for a few ms for an answer, then just go try the primary, and the user will get authenticated. But none of this should be an authentication outage.
So, in this state, you can do network checks or look in auth activity logs to see if that web agent ever did an authentication to the primary, or was it only able to authenticate to the replica for some reason...in other words something not quite right with the 'can reach any RSA server at all times' idea.
But if you want to straighten it all out: yes, replace sdconf.rec with a current one that has the current RSA servers in it, and find and delete any file named sdstatus.* on the agent. The sdstatus file, is the file which does the load balancing and knows which RSA server responds faster or slower, so when you have issues like this, deleting the sdstatus cleans up any knowledge of 'stale rsa servers' and forces the agent to look at the sdconf.rec again, and when it contacts the primary, the primary will tell the agent 'this is my current list of replicas'.
Summary: Normally you can add and delete replicas, and the agents figure it out automatically, nothing needs to be done on the agent side..... But when problems occur, you can deploy new sdconf.rec and delete sdstatus.* files to force the agent to only use newest information.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This scenario was probably a perfect storm then.
The full scenario:
Original Primary: RSA01
1. Promoted a replica to primary (RSA02)
2. Shut down the old primary (now replica) server RSA01
3. Deleted the replica from the new primary RSA02
4. Rebuilt the RSA01 server with a new IP (Same network as RSA02)
5. Joined RSA01 back into the cluster as a replica
- This is the point where we were notified about the error message
6. Promoted RSA01 back to primary after all replication was completed
7. Generated new sdconf.rec file and applied it to the webmail proxy that prompts for RSA authentication restoring access
I imagine there was insufficient time for the file on the webmail proxy to update which created this issue.
Thank you for your explanation!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi EdwardDavis
Great explanation related to sdconf.rec file.
We have 1 primary and 5 replicas. When I see on all 4 AMIS servers sdconf.rec file, I saw only primary server name and all 5 replica names are "missing". For further investigation, I pulled authentication activity report for last 10 days. Surprisingly I see all 4 AMIS servers are sending authentication requests to all 5 replica servers.
Questions:
1. How AMIS Agents are able to authenticate using other 5 replica servers?
2. If I open port 5580 from AMIS Agent---> RSA AM servers will sdconf.rec on Agents get updated automatically?
As per note of 5580 on portal.
3.I am unable to find sdstatus file on AMIS servers? Am I missing anything?
https://rsa.jiveon.com/docs/DOC-77445
"Used to receive requests for additional offline authentication data, and send the offline data to agents. "Also used to update server lists on agents."
This can be closed if offline authentications are not in use and no agents in your deployment use the Login Password Integration API."
Thanks
Mo
