How to configure HA log collection?
I am interested in whether someone has found a robust solution for creating fault-tolerant log collection in their NetWitness Logs architecture. What I usually see are recommendations to configure a VLC to fail over to a second Log Decoder (Local Log Collector) in case of a failure, but this does not solve the issue that whenever I have problem with the VLC itself or when I want to upgrade the VLC, there will be nothing accepting the incoming logs.
We have tried to circumvent this by using an F5 load balancer in front of the VLCs, but if and when we would prefer to use TCP for Syslog forwarding where possible, we would lose the actual device.ip, which gets replaced by that of the F5 SNAT IP. As you might image, losing the real device.ip will then lead to all sorts of problems with ESM etcetera.
Has anyone found a decent solution (besides using UDP and an external load balancer) for this problem?
- Community Thread
- f5 load balancer
- Forum Thread
- high availability
- netwitness for logs
- RSA NetWitness
- RSA NetWitness Platform
Just found this ealier post: VLC Failover without using a third-party load balance solution. This might actually be what I am looking for. Still eager to hear about any experiences on that or anything else regarding this issue as well.
I spoke with an RSA resource psGMi56HbaehtdgCfBAG3odxAUvR7AXvWAoBnEVSrTM= about the potential to do HA/load balancing for log collection and perhaps he can add his expertise directly in this thread.
I've tested using an F5 VIP for UDP syslog and as you mentioned it works great, however for TCP we have that SNAT problem.
Instead we're looking at creating a round-robin Infoblox record to keep all the destination collector IP's in one A record and then just cycle through them as requested. A few problems and benefits of this approach.
- The source IP is always maintained for the log source regardless of protocol.
- A single destination FQDN for all configurations regardless of source.
- Load balancing is achieved, albeit not in the most elegant way as we would have via LTM VIP.
- No additional infrastructure we're dependent on that may have problems handling our throughput.
- If we want to remove an IP we can update the A record, although it'll take time to replicate but we'll keep the TTL of that record lower than usual to help with replication across the environment.
- No 'health' monitoring so to speak which a good failover design has, so if a collector goes down or has problems sources will still attempt to send to it, unless Infoblox has a solution to that as well.
Keep in mind I haven't tested the round-robin DNS record method yet, it just will provide enough benefit with little impact that we're going to explore it some more.
This could work and would be a no/low cost solution, however, what measures are in place to ensure the hosts in that 'DNS pool" are available? What happens (how does InfoBlox handle) when one of the pool members is down?
You're right Naushad, there is nothing infoblox does from a health perspective. I'd have to MANUALLY go update the round-robin A record and remove hosts that are down.