How to configure Load Balancing on F5 for RADIUS Clients in Authentication Manager v. 8.5 or earlier
Summary: How to decide on and configure a load balancer strategy in Authentication Manager, AM v. 8.5, examples with F5 Load Balancer
Issue: Load Balancers such as F5, and Citrix NetScaler send various Keep-alive packets to RADIUS Servers and determine availability based on the responses to those Keep-alives. These keepalive packets can be as simple as a TCP port 3-Way SYN handshake, which is beginning of any reliable network connection. The Load Balancer just needs a TCP port number, such as 1813 or 1812 for RADIUS.
LB sends SYNchronize → to AM server port 1813
← SYNchronizeACKnowledgement AM server responds
A good Load Balancer would close this Connection after 3rd Acknowledgement packet to save the AM server from having to time out the connection.
Keepalives can also be as complex as a full Authentication Request with UserID and Passcode. In order to be successful, the UserID and Passcode must be valid. A fixed Passcode can be used here, though it is less secure, and is not 2-factor authentication.
Depending which keepalive method is selected, AM Admins need to be aware of scalability, reporting and performance issues that could affect the AM servers. For example, a TCP port touch with the 3-way SYN, SYN-ACK and ACK packet sequence Handshake has the least impact on the AM servers resources. TCP SYN packets do not show in the RADIUS <date>.log files (e.g. /opt/rsa/am/radius/20210331.log for March 31, 2021) and do not show in the AM Authentication Real Time Monitor or reports. Depending which port is chosen, a response can indicate that either the AM server is up, or that the RADIUS Service on the AM server is up.
The most resource intensive keepalive would be a full authentication Request with UserID and Passcode. These Keepalives would show in both the RADIUS <date>.logs and in the AM Authentication Real Time Monitor or reports. The frequency of these Keepalives also comes into play.
Your Tasks here are to:
- decided which Load Balancer approach scales best for your site/realm
- configure one of the 5 approaches to RADIUS Client Load Balancing
Depending on your requirements, you might configure User Test logons with a fixed passcode every 600 seconds or 5 minutes, with resulting log entries. Or you might configure a TCP port keepalive touch with no log impact.
As you will see in the Resolution, F5 provides a third approach, kind of a middle approach between the TCP port and full Authentication with UserID.
Resolution: In General, you have Three approaches or options to RADIUS keep-alives:
1. TCP SYN to specific AM port. You might use TCP 1813 RADIUS accounting port, or 1812 as a simple way to determine that an AM RADIUS server is up and can be used in a RADIUS load balance. Refer to documentation from your RADIUS Client to determine specifics on how to configure this, but typically it's just a TCP port number and frequency (see Notes for frequency cautions).
Minimal AM server impact with no AM configuration needed. Excellent Scalability.
Note, when AM 8.6 is released around July 2021, the Pulse SBR RADIUS currently used in AM 8.5 or earlier will be replaced by FreeRADIUS, which will use AM replication - therefore, TCP ports 1812 and 1813 will not be up and listening on an AM 8.6 servers by default (nor are TCP 1645 and 1646 alternate RADIUS ports).
2. A full Authentication Request with UserID and Fixed Passcode, such as the F5 RADIUS Monitor or Citrix NetScaler User logon for High Availability. The Load balancer is looking for a response to indicate RADIUS is up and processing Authentication requests, even if the response is Access Denied.
Maximum AM server impact with poor Scalability, so be careful with frequency of these authentication. In a large network with thousands of RADIUS clients, you want to be careful that your High availability RADIUS test authentications do not make your AM RADIUS server unavailable by overusing the resources that were planned for real users. See Notes for Real World impact story * in Note2 below
3. UDP Monitor on F5 can send a null string to a UDP port such as 1812, which triggers reject response and puts entry in RADIUS date log "Truncated authentication request" but shows nothing in the Authentication Manager Real Time Monitor or Authentication reports.
Scales better than full Authentication request, but you still need to be careful when dealing with thousands of RADIUS load balancers or frequencies less than every 5 minutes with a dozen or more Load balancers. This is a simple math problem, frequency of test times number of loadbalancers will tell you an idea of the load your testing is putting on the AM servers. Default on F5 is 300 seconds.
Note 1 - Load Balancers use the term 'stickiness' to refer to using the same servers from the Load Balance Pool for multi-step procedures. What this means is, Authentication is typically a single step, discreet action - user sends UserID and Passcode to any AM server to authenticate, single response from an AM server completes the transaction. However, with New PIN mode or a PIN Change Policy in effect, some transactions become multi-step;
step a) user enters ID and Passcode (or tokencode if in New PIN mode)
step b) AM server authenticates user, but includes prompt for something else, e.g. create New PIN or enter Next Token Code
step c) this second response from the RADIUS client, with the new PIN, must go to the same AM server RADIUS that authenticated the user in steps a&b. This is the concept of stickiness. See Load Balancer documentation for configuration details.
* Real World impact story
A real world example of frequency and keepalive impact was a customer support case where 9 Citrix NetScalers were configured to send authentication requests every 10-15 seconds. This resulted in over 92,000 (92 Thousand) authentication requests every 24 hours. The Netscalers were not configured with an actual UserID, they simple select the UserID 'test' with a make up or invalid Fixed Passcode. This meant every day the AM serers process 92,000 failed authentication requests for non-existant UserID test, all of which failed to resolve the User name after performing name lookups in both the Internal AM database and all external LDAP Identity sources. The Real Time Authentication monitor was overflowwing with these failed authentication messages, making it difficult to mearly impossible to search for real authentication failures. The fact that the authentication failed did not matter, the Netscalers only needed a response to maintain the AM servers in their active server list.
Note3: The TCP ports should not be accessible to any systems other than other RSA appliances, including proxies such as Load Balancers. In general you want to protect all TCP ports, even from RADIUS clients, who only need to authenticate to UDP port 1812 or 1645. While a load balancer such as an F5 can use TCP port 1812 for keepalives, all Firewalls (and Load Balancers) should prevent pass through access to these TCP ports.
The RSA® Authentication Manager8.5 Planning Guide
Says of TCP port 1812, "[t]his port is used for communication between primary RADIUS and replica RADIUS services. If you do not use RSA RADIUS, but you have replica instances, you must allow connections between Authentication Manager instances on this port. You should restrict connections from other systems that are not Authentication Manager instances. For more information, see Required RSA RADIUS Server Listening Ports."
The TCP based healthcheck is no longer viable with AM 8.6 and newer.. pursue the deeper healthchecks with the fixed passcode as an option for best accuracy but some limited impact.. a 5-10second check will have minimal impact.