If you need to achieve HA through load balancing and failover for VLCs on AWS you can use the built-in AWS load balancer. I have tested this scenario so I am going to share the outcome here.
Before starting I need to state that VLCs failover/balancing is not an RSA officially supported functionality. Furthermore this can only work with "push" collections such as syslog, snmp, etc. It does not work with "pull" collections such us Windows, Checkpoint, ODBC, etc. (at least not that I am aware of and I have personally never tested it).
That being said, let's get started.
As you may be aware, in AWS EC2 you have separate geographic areas called Regions (I am using US East - N.Virgina here) and within regions you have different isolated locations called Availability Zones.
We are going to leverage this concept and we will place two VLCs into two different Availability Zones. If one VLC fails we will have the VLC in the other Availability Zone to take over.
The following diagram helps understanding the scenario (for better clarity I omitted the data flow from the VLCs to the Log Decoder/s):
Assuming you have already deployed the two VLC instances, the next step to do is creating two different subnets and associate two different Availability Zones to each of them .
- Next we need to create a Target Group (from the EC2 menu) which will be used to route requests to our registered targets (the VLCs):
- Finally we need to create the load balancer itself. For this specific test I have used a Network Load Balancer but I think an Application Load Balancer would work too. I selected an internal balancer. I chose syslog on TCP port 514 so I created a listener for that. Actually, the AWS load balancer does not support UDP so I was forced to use TCP, however I would have used syslog over TCP anyway as it is more robust and reliable and large syslog messages can be transferred (especially if it is a production environment). I also select the appropriate VPC and the Availability Zones (and subnets) accordingly.
In the advanced health check settings I chose to use port 5671 (by default the balancer would have used the same as the listener, 514). The reason of using 5671 is because the whole log collection mechanism works with rabbitmq which uses this port. In fact the only scenario 514 would not work is when the VLC instance is down or if we stop the syslog collection. I think rabbitmq is more prone to failures and may fail in more scenarios, such as queues filling up because the decoder is not consuming the logs, full partitions, network issues, etc.
- Once the load balancer configuration is finished you will see something similar:
We need to take note of the DNS A Record as this is what our event sources will use to send syslog traffic to.
- Now to configure an event source to send syslog logs to the load balancer you just need to point the event source to the load balancer DNS A Record. As an example, for a Red Hat Linux machine you should edit the /etc/rsyslog.conf file as follow:
We are using @@ because is TCP, for UDP it's just one @.
Then we need to restart the rsyslog service as follow:
--> service rsyslog restart (Red Hat 6)
--> systemctl restart rsyslog (Red Hat 7)
- To perform a more accurate and controlled test and demonstration, I am installing a tool on the same event source and I will push some rhlinux logs to the load balancer and see what happens. The tool is an RSA proprietary one and is called NwLogPlayer (more details here How To Replay Logs in RSA NetWitness ). It can be installed via Yum if you have enabled the RSA Netwitness repo:
I also prepared a rhlinux sample logs file with 14000 events and I am going to inject these to the load balancer and see what happens. Initially my Log Decoder LogStats page is empty:
Then I start with the first push of the 14000 events:
Now I can see the first 14000 events went to VLC2 (172.24.185.126)
At my second push I can see the whole chuck going to VC1 (172.24.185.105)
At the third push the logs went again to VLC2
At the fourth push the logs went to VLC1
At the fifth push, I sent 28000 events (almost simultaneously) and they get divided to both VLCs
This demonstrates that the load has been balanced equally between the two VLCs.
Now I stop VLC1 (I actually stopped the rabbitmq-service on VLC1) and I push other 14000 logs:
On both instances above VLC2 received the two chunks of 14000 logs since VLC1 was down. We can safely say that Failover is working fine!