|Applies To||RSA Product Set: RSA Netwitness Logs and Packets|
RSA Product/Service Type: All Netwitness Logs and Packets Nodes.
RSA Version/Condition: 10.x
- Service keeps respawning with different Process ID PID, then service is killed by kernel.
- This happens when server is highly utilized, and more than one process is intensively using the memory.
- Kernel keeps throwing error messages about OOM Killer invoked "Out Of Memory Killer".
- Customer has to start the service manually.
- This problem happens because of low RAMs found in the server, or in other words, more utilization of concurrent processes than the memory can actually handle.
- By Default, Kernel has the value vm.overcommit_memory set to '0'. this means that whenever an application requests memory to be allocated "program calls a malloc() function", kernel will always provide such memory addresses requested, hoping that all these applications will never actually utilize these addresses, or at least not at the same time. only when these programs try to use the allocated memory through read/write, it will be marked as truly used.
- Problem happens when these programs truly use these allocations at the same time, which makes the kernel out of actual memory to provide for these programs. hence the overcommit strategy totally fails. kernel then sacrifices one of the processes and invokes the out of memory killer module "oom-killer" to send the kill signal to this process. this can be noticed in your kernel panic logs in /var/log/messages as below
Jun 13 12:56:41 Dec1 kernel: NwWarehouseConn invoked oom-killer: gfp_mask=0x280da, order=0, oom_adj=0, oom_score_adj=0
Jun 13 12:56:41 Dec1 kernel: Out of memory: Kill process 14010 (NwDecoder) score 326 or sacrifice child
Jun 13 12:56:41 Dec1 kernel: Killed process 14010, UID 0, (NwDecoder) total-vm:101376128kB, anon-rss:23774892kB, file-rss:616kB
Jun 13 12:56:41 Dec1 collectd: NgNativeReader_NwWarehouseConnector-FastUpdate: nwsdk failure: NwSendMessage returned 0; code 109; error: 60 second timeout reached waiting for server response
Jun 13 12:56:41 Dec1 init: nwdecoder main process (14010) killed by KILL signal
Jun 13 12:56:41 Dec1 init: nwdecoder main process ended, respawning
From the logs, you can notice that the oom-killer module was invoked by kernel, then sacrificed the child, then sent kill signal, then finally process was respawned with another PID.
|Notes||If you are having the same problem, but you are unsure of these steps, please contact email@example.com|