000038886 - RabbitMQ file descriptor limit reached in RSA NetWitness Platform 11.4.x

Document created by RSA Customer Support Employee on May 27, 2020Last modified by RSA Customer Support Employee on Aug 6, 2020
Version 7Show Document
  • View in full screen mode

Article Content

Article Number000038886
Applies ToRSA Product Set: RSA NetWitness Platform
RSA Product/Service Type: All Servers
RSA Version/Condition: 11.4.x
Platform CentOS
O/S Version: 7
Issue

To see the article in a demo format, view the RSA EduTube video on RabbitMQ file descriptor limit reached in RSA NetWitness Platform 11.4.x.




The RSA NetWitness appliance's RabbitMQ service appears not to be processing even though the service is still running. When performing a netstat on the server there are a large number of connections, possibly in the thousands, associated with RabbitMQ (beam.smp) process.

The following messages may be found in the /var/log/rabbitmq/rabbit_<UUID>.log:
 
2020-04-15 14:10:08.053 [warning] <0.584.0> Ranch acceptor reducing accept rate: out of file descriptors
2020-04-15 14:10:08.056 [error] <0.19260.1138> CRASH REPORT Process <0.19260.1138> with 0 neighbours exited with reason: bad argument in call to ets:lookup(rabbit_exchange, {resource,<<"/rsa/system">>,exchange,<<"carlos.audit">>}) in rabbit_misc:dirty_read/1 line 395
2020-04-15 14:10:08.056 [error] <0.19626.1138> CRASH REPORT Process <0.19626.1138> with 0 neighbours exited with reason: bad argument in call to ets:lookup(rabbit_exchange, {resource,<<"/rsa/system">>,exchange,<<"carlos.sms.collectd">>}) in rabbit_misc:dirty_read/1 line 395


2020-04-15 14:10:08.056 [error] <0.3771.0> Supervisor {<0.3771.0>,rabbit_federation_link_sup} had child {upstream


[<<"amqps://10.41.82.34:5671?auth_mechanism=external">>],


<<"carlos.audit">>,<<"carlos.audit">>,1000,1,5,3600000,none,false,


'on-confirm',none,
<<"carlos-upstream-f51f708a-d04e-437f-8e3c-2b46672bf1cb">>,false} started with rabbit_federation_exchange_link:start_link({{upstream,
[<<"amqps://10.41.82.34:5671?auth_mechanism=external">>],<<"carlos.audit">>,<<"carlos....">>,...},...}) at {restarting,<0.6913.1050>} exit with reason bad argument in call to ets:lookup(rabbit_exchange, {resource,<<"/rsa/system">>,exchange,<<"carlos.audit">>}) in rabbit_misc:dirty_read/1 line 395 in context start_error


2020-04-15 14:10:08.057 [error] <0.2635.0> Supervisor {<0.2635.0>,rabbit_federation_link_sup} had child {upstream,[<<"amqps://10.41.82.32:5671?auth_mechanism=external">>],


<<"carlos.sms.collectd">>,<<"carlos.sms.collectd">>,1000,1,5,
3600000,none,false,'on-confirm',none,
<<"carlos-upstream-18e5b1f6-1698-4a55-848b-cbda1d3d8380">>,false} started with rabbit_federation_exchange_link:start_link({{upstream
[<<"amqps://10.41.82.32:5671?auth_mechanism=external">>],<<"carlos.sms.collectd">>,<<"...">>,...},...}) at {restarting,<0.7949.1050>} exit with reason bad argument in call to ets:lookup(rabbit_exchange, {resource,<<"/rsa/system">>,exchange,<<"carlos.sms.collectd">>}) in rabbit_misc:dirty_read/1 line 395 in context start_error


2020-04-15 14:10:08.058 [warning] <0.587.0> Ranch acceptor reducing accept rate: out of file descriptors
2020-04-15 14:10:08.064 [warning] <0.579.0> Ranch acceptor reducing accept rate: out of file descriptors
2020-04-15 14:10:08.066 [warning] <0.600.0> Ranch acceptor reducing accept rate: out of file descriptors
2020-04-15 14:10:08.066 [error] <0.19116.1138> CRASH REPORT Process <0.19116.1138> with 0 neighbours exited with reason: bad argument in call to ets:lookup(rabbit_exchange, {resource,<<"/rsa/system">>,exchange,<<"carlos.audit">>}) in rabbit_misc:dirty_read/1 line 395


2020-04-15 14:10:08.066 [error] <0.3771.0> Supervisor {<0.3771.0>,rabbit_federation_link_sup} had child {upstream,[<<"amqps://10.203.128.181:5671?auth_mechanism=external">>],


<<"carlos.audit">>,<<"carlos.audit">>,1000,1,5,3600000,none,false,
'on-confirm',none, <<"carlos-upstream-b3ad4751-6cc5-4f67-8d50-ca20c2b25fed">>,false} started with rabbit_federation_exchange_link:start_link({{upstream,[<<"amqps://10.203.128.181:5671?auth_mechanism=external">>],<<"carlos.audit">>,<<"carl...">>,...},...}) at {restarting,<0.6090.1050>} exit with reason bad argument in call to ets:lookup(rabbit_exchange, {resource,<<"/rsa/system">>,exchange,<<"carlos.audit">>}) in rabbit_misc:dirty_read/1 line 395 in context start_error


2020-04-15 14:10:08.069 [warning] <0.586.0> Ranch acceptor reducing accept rate: out of file descriptors
2020-04-15 14:10:08.071 [warning] <0.583.0> Ranch acceptor reducing accept rate: out of file descriptors
2020-04-15 14:10:08.073 [error] <0.19158.1138> CRASH REPORT Process <0.19158.1138> with 0 neighbours exited with reason: bad argument in call to ets:lookup(rabbit_exchange, {resource,<<"/rsa/system">>,exchange,<<"carlos.sms.collectd">>}) in rabbit_misc:dirty_read/1 line 395
2020-04-15 14:10:08.073 [error] <0.2635.0> Supervisor {<0.2635.0>,rabbit_federation_link_sup} had child {upstream,[<<"amqps://153.7.72.225:5671?auth_mechanism=external">>], <<"carlos.sms.collectd">>,<<"carlos.sms.collectd">>,1000,1,5, 3600000,none,false,'on-confirm',none, <<"carlos-upstream-698a3d8d-ba3e-4a93-a25c-b1185a966e86">>,false} started with rabbit_federation_exchange_link:start_link({{upstream,[<<"amqps://153.7.72.225:5671?auth_mechanism=external">>],


<<"carlos.sms.collectd">>,...},...}) at {restarting,<0.8430.1050>} exit with reason bad argument in call to ets:lookup(rabbit_exchange, {resource,<<"/rsa/system">>,exchange,<<"carlos.sms.collectd">>}) in rabbit_misc:dirty_read/1 line 395 in context start_error


2020-04-15 14:10:08.081 [warning] <0.599.0> Ranch acceptor reducing accept rate: out of file descriptors
2020-03-02 17:19:46.106 [error] <0.19709.3856> CRASH REPORT Process <0.19709.3856> with 0 neighbours exited with reason: {aborted,{no_exists,[rabbit_runtime_parameters,cluster_name]}} in mnesia:abort/1 line 355
2020-03-02 17:19:46.106 [error] <0.15120.3869> Supervisor {<0.15120.3869>,rabbit_connection_sup} had child reader started with rabbit_reader:start_link(<0.17481.3872>, {acceptor,{0,0,0,0,0,0,0,0},5672}) at <0.19709.3856> exit with reason {aborted,{no_exists,[rabbit_runtime_parameters,cluster_name]}} in context child_terminated
2020-03-02 17:19:46.106 [error] <0.15120.3869> Supervisor {<0.15120.3869>,rabbit_connection_sup} had child reader started with rabbit_reader:start_link(<0.17481.3872>, {acceptor,{0,0,0,0,0,0,0,0},5672}) at <0.19709.3856> exit with reason reached_max_restart_intensity in context shutdown 2020-03-02 17:19:46.156 [error] <0.4268.3859> CRASH REPORT Process <0.4268.3859> with 0 neighbours exited with reason: bad argument in call to ets:lookup(rabbit_exchange, {resource,<<"/rsa/system">>,exchange,<<"carlos.alerts">>}) in rabbit_misc:dirty_read/1 line 395 2020-03-02 17:19:46.157 [error] <0.455.0> Supervisor {<0.455.0>,rabbit_federation_link_sup} had child {upstream,[<<"amqps://172.19.108.192:5671?auth_mechanism=external">>],           


<<"carlos.alerts">>,<<"carlos.alerts">>,1000,1,5,3600000,none,false,           
'on-confirm',none,           
<<"carlos-upstream-d40020aa-9396-4412-bde2-58f863530e9d">>,false} started with rabbit_federation_exchange_link:start_link({{upstream,[<<"amqps://172.19.108.192:5671?auth_mechanism=external">>],<<"carlos.alerts">>,<<"car...">>,...},...}) at {restarting,<0.10709.1780>} exit with reason bad argument in call to ets:lookup(rabbit_exchange, {resource,<<"/rsa/system">>,exchange,<<"carlos.alerts">>}) in rabbit_misc:dirty_read/1 line 395 in context start_error


The following messages may be found in /var/log/rabbitmq/log/crash.log: 
 
2020-04-15 14:15:58 =CRASH REPORT====


crasher:


initial call: amqp_gen_connection:init/1
pid: <0.22077.1048>
registered_name: []
exception error: {function_clause,[{amqp_gen_connection,terminate,[{shutdown,{gen_server2,call,[file_handle_cache,{obtain,1,socket,<0.22077.1048>},infinity]}},{<0.23240.1048>,{amqp_params_network,<<"guest">>,<<"guest">>,<<"/rsa/system">>,"10.224.254.214",5671,2047,0,10,60000,[],[#Fun<amqp_uri.12.79294410>],[{<<"connection_name">>,longstr,<<"Federation link (upstream: carlos-upstream-93ec817a-188c-41cd-b66a-cb370f023615, policy: carlos-federate)">>}],[]}}],[{file,"src/amqp_gen_connection.erl"},{line,239}]},{gen_server,try_terminate,3,[{file,"gen_server.erl"},{line,673}]},{gen_server,terminate,10,[{file,"gen_server.erl"},{line,858}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}
ancestors: [<0.23914.1048>,amqp_sup,<0.259.0>]
message_queue_len: 0
messages: []
links: [<0.23914.1048>]
dictionary: []
trap_exit: true
status: running
heap_size: 1598
stack_size: 27
reductions: 1097


neighbours:


2020-04-15 19:46:33 =SUPERVISOR REPORT====


Supervisor: {<0.20717.45>,amqp_channel_sup_sup}
Context: shutdown_error
Reason: shutdown
Offender: [{nb_children,1},{name,channel_sup},{mfargs,{amqp_channel_sup,start_link,[network,<0.20634.45>,<<"client 153.7.72.222:47578 -> 10.95.222.3:5671">>]}},{restart_type,temporary},{shutdown,infinity},{child_type,supervisor}]



2020-04-15 20:06:47 =SUPERVISOR REPORT====


Supervisor: {<0.12850.48>,amqp_channel_sup_sup}
Context: shutdown_error
Reason: shutdown
Offender: [{nb_children,1},{name,channel_sup},{mfargs,{amqp_channel_sup,start_link,[network,<0.12775.48>,<<"client 153.7.72.222:36501 -> 10.95.222.6:5671">>]}},{restart_type,temporary},{shutdown,infinity},{child_type,supervisor}]

 


2020-03-31 14:14:56 =CRASH REPORT====   


crasher:     


initial call: rabbit_federation_exchange_link:init/1     
pid: <0.28993.2084>     
registered_name: []     
exception exit: {{badarg,[{ets,lookup,[rabbit_exchange,{resource,<<"/rsa/system">>,exchange,<<"carlos.audit">>}],[]},{rabbit_misc,dirty_read,1,[{file,"src/rabbit_misc.erl"},{line,395}]},{rabbit_federation_exchange_link,init,1,[{file,"src/rabbit_federation_exchange_link.erl"},{line,76}]},{gen_server2,init_it,6,[{file,"src/gen_server2.erl"},{line,554}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]},[{gen_server2,init_it,6,[{file,"src/gen_server2.erl"},{line,597}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}     ancestors: [<0.678.0>,<0.413.0>,rabbit_federation_exchange_link_sup_sup,rabbit_federation_sup,rabbit_sup,<0.287.0>] message_queue_len: 0     
messages: []     
links: [<0.678.0>]     


dictionary: []     
trap_exit: false     
status: running     
heap_size: 610     
stack_size: 27     
reductions: 241   


neighbours:


2020-03-31 14:14:56 =SUPERVISOR REPORT====      


Supervisor: {<0.678.0>,rabbit_federation_link_sup}      


Context:    start_error      
Reason:     {badarg,[{ets,lookup,[rabbit_exchange,{resource,<<"/rsa/system">>,exchange,<<"carlos.audit">>}],[]},{rabbit_misc,dirty_read,1,[{file,"src/rabbit_misc.erl"},{line,395}]},{rabbit_federation_exchange_link,init,1,[{file,"src/rabbit_federation_exchange_link.erl"},{line,76}]},{gen_server2,init_it,6,[{file,"src/gen_server2.erl"},{line,554}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}      
Offender:   [{pid,{restarting,<0.13111.682>}},{name,{upstream,[<<"amqps://10.100.6.20:5671?auth_mechanism=external">>],<<"carlos.audit">>,<<"carlos.audit">>,1000,1,5,3600000,none,false,'on-confirm',none,<<"carlos-upstream-14a5aeef-6a2b-4918-a048-97abea48151a">>,false}},{mfargs,{rabbit_federation_exchange_link,start_link,[{{upstream,[<<"amqps://10.100.6.20:5671?auth_mechanism=external">>],<<"carlos.audit">>,<<"carlos.audit">>,1000,1,5,3600000,none,false,'on-confirm',none,<<"carlos-upstream-14a5aeef-6a2b-4918-a048-97abea48151a">>,false},{resource,<<"/rsa/system">>,exchange,<<"carlos.audit">>}}]}},{restart_type,{permanent,5}},{shutdown,30000},{child_type,worker}]  

2020-03-31 14:14:56 =CRASH REPORT====   
crasher:     


initial call: rabbit_federation_exchange_link:init/1     
pid: <0.25092.2090>     
registered_name: []     
exception exit: {{badarg,[{ets,lookup,[rabbit_exchange,{resource,<<"/rsa/system">>,exchange,<<"carlos.alerts">>}],[]},{rabbit_misc,dirty_read,1,[{file,"src/rabbit_misc.erl"},{line,395}]},{rabbit_federation_exchange_link,init,1,[{file,"src/rabbit_federation_exchange_link.erl"},{line,76}]},{gen_server2,init_it,6,[{file,"src/gen_server2.erl"},{line,554}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]},[{gen_server2,init_it,6,[{file,"src/gen_server2.erl"},{line,597}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}     ancestors: [<0.498.0>,<0.413.0>,rabbit_federation_exchange_link_sup_sup,rabbit_federation_sup,rabbit_sup,<0.287.0>]     message_queue_len: 0     


messages: []     
links: [<0.498.0>]     
dictionary: []     
trap_exit: false     
status: running    
heap_size: 610     
stack_size: 27     
reductions: 241   


neighbours:


2020-03-31 14:14:56 =SUPERVISOR REPORT====      


Supervisor: {<0.498.0>,rabbit_federation_link_sup}      
Context:    start_error      
Reason:     {badarg,[{ets,lookup,[rabbit_exchange,{resource,<<"/rsa/system">>,exchange,<<"carlos.alerts">>}],[]},{rabbit_misc,dirty_read,1,[{file,"src/rabbit_misc.erl"},{line,395}]},{rabbit_federation_exchange_link,init,1,[{file,"src/rabbit_federation_exchange_link.erl"},{line,76}]},{gen_server2,init_it,6,[{file,"src/gen_server2.erl"},{line,554}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}      Offender:   [{pid,{restarting,<0.12183.682>}},{name,{upstream,[<<"amqps://10.100.217.26:5671?auth_mechanism=external">>],<<"carlos.alerts">>,<<"carlos.alerts">>,1000,1,5,3600000,none,false,'on-confirm',none,<<"carlos-upstream-5d4a0f18-4c24-4a17-8f3d-759d96cf4e50">>,false}},{mfargs,{rabbit_federation_exchange_link,start_link,[{{upstream,[<<"amqps://10.100.217.26:5671?auth_mechanism=external">>],<<"carlos.alerts">>,<<"carlos.alerts">>,1000,1,5,3600000,none,false,'on-confirm',none,<<"carlos-upstream-5d4a0f18-4c24-4a17-8f3d-759d96cf4e50">>,false},{resource,<<"/rsa/system">>,exchange,<<"carlos.alerts">>}}]}},{restart_type,{permanent,5}},{shutdown,30000},{child_type,worker}]



2020-03-02 17:19:46 =SUPERVISOR REPORT====      
Supervisor: {<0.31133.3865>,rabbit_connection_sup}      
Context:    shutdown      
Reason:     reached_max_restart_intensity      
Offender:   [{pid,<0.23708.3863>},{name,reader},{mfargs,{rabbit_reader,start_link,[<0.7447.3872>,{acceptor,{0,0,0,0,0,0,0,0},5672}]}},{restart_type,intrinsic},{shutdown,30000},{child_type,worker}]  

2020-03-02 17:19:46 =CRASH REPORT====   


crasher:     


initial call: rabbit_federation_exchange_link:init/1     
pid: <0.23253.3872>     
registered_name: []     
exception exit: {{badarg,[{ets,lookup,[rabbit_exchange,{resource,<<"/rsa/system">>,exchange,<<"carlos.alerts">>}],[]},{rabbit_misc,dirty_read,1,[{file,"src/rabbit_misc.erl"},{line,395}]},{rabbit_federation_exchange_link,init,1,[{file,"src/rabbit_federation_exchange_link.erl"},{line,76}]},{gen_server2,init_it,6,[{file,"src/gen_server2.erl"},{line,554}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]},[{gen_server2,init_it,6,[{file,"src/gen_server2.erl"},{line,597}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]}     
ancestors: [<0.455.0>,<0.385.0>,rabbit_federation_exchange_link_sup_sup,rabbit_federation_sup,rabbit_sup,<0.274.0>] message_queue_len: 0    
messages: []     
links: [<0.455.0>]     
dictionary: []     
trap_exit: false     
status: running     
heap_size: 610     
stack_size: 27     
reductions: 241   


neighbours:
CauseThe RabbitMQ service runs out of file descriptors and brings down the node; however, the RabbitMQ service may remain running. While in this state, RabbitMQ stops processing new messages, but may not produce a crash dump. Instead, the service is no longer able to perform processing functions.
ResolutionThe fix for this issue will be to upgrade to either RSA NetWitness Platform 11.4.1.2 or 11.5.0, once those versions are available.
Workaround

Until the official versions are released, a workaround for this issue is available as a download attached to this article (rabbitmq-performance-master.zip).



Note: This script attempts to access the following servers using the REST interface ports: Archiver, Broker, Concentrator, Network/Log Decoder, Endpoint Hybrid, Network/Log Hybrid, VLC, and Malware. This means that the REST interface ports must be accessible to the NW Admin server for this script to function correctly, see the Deployment Guide: Network Architecture and Ports for more information about the REST ports. If the REST interface ports are not open between the NW Admin server and the other RSA NetWitness appliances, see the Manual Change Adjustment method later in this document. 


 

Automated REST Adjustment



  1. Download the rabbitmq-performance-master.zip from this knowledge base article.
  2. SCP this script to the NW Admin server and extract it.
  3. On the NW Admin server, run the script as shown below.


OWB_ALLOW_NON_FIPS=1 ./rabbitmq_performance_fix.py


  1. The script will prompt for the admin user account's password when connecting to the REST interface. If there are issues with the password, the deploy_admin can be used instead. See the -h/--help option of the script for details.

Note: A log file for the script will be created in the same directory where the script is run from. Debugging can be enabled by going into the script on line 39 and changing the logging.INFO to logging.DEBUG.



  1. Once the script is complete, restart the rabbitmq-server service on the NW Admin server.


systemctl restart rabbitmq-server


If there are issues using the automated script, see the Manual Change Adjustment section below.


 

Manual Change Adjustment



If the fix for this issue cannot be performed using the automated script or there are special circumstances that prohibit the script's usage, it is possible to manually perform the changes on the services.



  1. Log in to the RSA NetWitness UI.
  2. Go to Admin > Services > <Service Name> > Actions > View > Explore.
  3. Open the node /services/<NWAdmin_UUID>:5671:amqp/config.
  4. Change the following options to the values shown below:

  • auto.open = false
  • reconnect.interval = 0

  1. Restart the rabbitmq-server service on the NW Admin server.


systemctl restart rabbitmq-server


If there are issues with the process above, contact RSA NetWitness Support for further assistance.

Outcomes