Correlation rule to check for devices not sending logs
I found that the correlationrule NIC023 (Monitor for a device not sending data) is a great rule to use as base for producing a new correlationrule for each device type when not retrieving any logdata. These instructions applies to 3.5.1 which we use for now.
What I have figured out (correct me somebody if I'm wrong) is that envision produces a 508100 message indicating how many messages have been parsed for a specific device. This message is produced once every minute. If this message says that 0 messages was parsed, then no messages arrived from that device. The devicetype is also included in that message (for example winevent_nic)
What I did:
- Open the NIC023 rule
- Click on the "Stat1" circuit rule
- Click on the s1 Statement.
- In the Add/Modify statement page you can Select the devicetype by klicking in the "Select devices by Devices/Class". Choose thereafter the deviceclass you need to monitor (for example "Host.Windows Hosts/Windows events(NIC)". It is though recommended to NOT change these values, since then a lot of default values is removed. What I did for our ACF2 device was to add Mainframe(ACF2) as a DeviceGroup and then in the Event selection section choosed eventID for IBM Mainframe(ACF2) IN *, which indicates if any known messages arrive, and thus if no messages arrive this would be triggered. If you choose this later approach you can skip steps 6 and 7 below.
- Expand the treshold Definition and set the interval that you think should be applicable for this device type. We have for example used the default Consider if 59 messages comes within 3600 seconds for the Windows servers, but added a greater timeframe for systems where we have nightly batchsending. If using the default value, the statement is triggered if no messages arrive within an hour.
- In the Add/Modify statement page klick on the Set Filter button at the bottom of page
- In the Set Statement filter page you see the where count = 0. Add one Filter by klicking on the Add Filter button at the bottom. Set the operator to AND, the variable to Device, the Comparison to IN and the criteria to the device type you are about the set the alert for. For agentless windows this would be winevent_nic.
- Notice! If you have some inactive devices (a device that has changed IP, is removed or similar) you need to add an additional filter: AND LocalAddress(IAddr) NOT IN <ip for a machine not producing logs anylonger>. If you don't add this last filter you will receive false alerts once every hour since the inactive device is for fact not producing any logs. This last rule could probably be achieved in several other ways, depending on how many machines that are covered by this rule and thus how much work it is to identify the relevant ones.
- Klick Apply to get back to the Add/Modify statement page.
- Klick Apply to get back to the Add/Modify circuit Definition page.
- Klick Apply to get back to the Mange Correlation rules/Add Modify rule page.
- Change the description texts to reflect the changes and set the decay time to be 61 minutes for the Windows example above or longer/shorter depending on what you set in the step 5 above. Typically this would be just above the set in the step 5 for this kind of alert.
- Klick on the save as and produce a good name for the alert.
- Done and hopefully it will work. Test it properly before relying on it...
I probably missed something, but somebody will for sure correct my errors...thanks. I'm attaching two pieces that we use. One for Mainframe ACF2 loaded batchwise every night and one for NIC Windows with one server added to