Article Content
Article Number | 000038886 |
Applies To | RSA Product Set: RSA NetWitness Platform RSA Product/Service Type: All Servers RSA Version/Condition: 11.4.x Platform CentOS O/S Version: 7 |
Issue | To see the article in a demo format, view the RSA EduTube video on RabbitMQ file descriptor limit reached in RSA NetWitness Platform 11.4.x. The RSA NetWitness appliance's RabbitMQ service appears not to be processing even though the service is still running. When performing a netstat on the server there are a large number of connections, possibly in the thousands, associated with RabbitMQ (beam.smp) process. The following messages may be found in the /var/log/rabbitmq/rabbit_<UUID>.log: 2020-04-15 14:10:08.053 [warning] <0.584.0> Ranch acceptor reducing accept rate: out of file descriptors 2020-04-15 14:10:08.056 [error] <0.19260.1138> CRASH REPORT Process <0.19260.1138> with 0 neighbours exited with reason: bad argument in call to ets:lookup(rabbit_exchange, {resource,<<"/rsa/system">>,exchange,<<"carlos.audit">>}) in rabbit_misc:dirty_read/1 line 395 2020-04-15 14:10:08.056 [error] <0.19626.1138> CRASH REPORT Process <0.19626.1138> with 0 neighbours exited with reason: bad argument in call to ets:lookup(rabbit_exchange, {resource,<<"/rsa/system">>,exchange,<<"carlos.sms.collectd">>}) in rabbit_misc:dirty_read/1 line 395 2020-04-15 14:10:08.056 [error] <0.3771.0> Supervisor {<0.3771.0>,rabbit_federation_link_sup} had child {upstream [<<"amqps://10.41.82.34:5671?auth_mechanism=external">>], <<"carlos.audit">>,<<"carlos.audit">>,1000,1,5,3600000,none,false, 'on-confirm',none, <<"carlos-upstream-f51f708a-d04e-437f-8e3c-2b46672bf1cb">>,false} started with rabbit_federation_exchange_link:start_link({{upstream, [<<"amqps://10.41.82.34:5671?auth_mechanism=external">>],<<"carlos.audit">>,<<"carlos....">>,...},...}) at {restarting,<0.6913.1050>} exit with reason bad argument in call to ets:lookup(rabbit_exchange, {resource,<<"/rsa/system">>,exchange,<<"carlos.audit">>}) in rabbit_misc:dirty_read/1 line 395 in context start_error 2020-04-15 14:10:08.057 [error] <0.2635.0> Supervisor {<0.2635.0>,rabbit_federation_link_sup} had child {upstream,[<<"amqps://10.41.82.32:5671?auth_mechanism=external">>], <<"carlos.sms.collectd">>,<<"carlos.sms.collectd">>,1000,1,5, 3600000,none,false,'on-confirm',none, <<"carlos-upstream-18e5b1f6-1698-4a55-848b-cbda1d3d8380">>,false} started with rabbit_federation_exchange_link:start_link({{upstream [<<"amqps://10.41.82.32:5671?auth_mechanism=external">>],<<"carlos.sms.collectd">>,<<"...">>,...},...}) at {restarting,<0.7949.1050>} exit with reason bad argument in call to ets:lookup(rabbit_exchange, {resource,<<"/rsa/system">>,exchange,<<"carlos.sms.collectd">>}) in rabbit_misc:dirty_read/1 line 395 in context start_error 2020-04-15 14:10:08.058 [warning] <0.587.0> Ranch acceptor reducing accept rate: out of file descriptors 2020-04-15 14:10:08.064 [warning] <0.579.0> Ranch acceptor reducing accept rate: out of file descriptors 2020-04-15 14:10:08.066 [warning] <0.600.0> Ranch acceptor reducing accept rate: out of file descriptors 2020-04-15 14:10:08.066 [error] <0.19116.1138> CRASH REPORT Process <0.19116.1138> with 0 neighbours exited with reason: bad argument in call to ets:lookup(rabbit_exchange, {resource,<<"/rsa/system">>,exchange,<<"carlos.audit">>}) in rabbit_misc:dirty_read/1 line 395 2020-04-15 14:10:08.066 [error] <0.3771.0> Supervisor {<0.3771.0>,rabbit_federation_link_sup} had child {upstream,[<<"amqps://10.203.128.181:5671?auth_mechanism=external">>], <<"carlos.audit">>,<<"carlos.audit">>,1000,1,5,3600000,none,false, 'on-confirm',none, <<"carlos-upstream-b3ad4751-6cc5-4f67-8d50-ca20c2b25fed">>,false} started with rabbit_federation_exchange_link:start_link({{upstream,[<<"amqps://10.203.128.181:5671?auth_mechanism=external">>],<<"carlos.audit">>,<<"carl...">>,...},...}) at {restarting,<0.6090.1050>} exit with reason bad argument in call to ets:lookup(rabbit_exchange, {resource,<<"/rsa/system">>,exchange,<<"carlos.audit">>}) in rabbit_misc:dirty_read/1 line 395 in context start_error 2020-04-15 14:10:08.069 [warning] <0.586.0> Ranch acceptor reducing accept rate: out of file descriptors 2020-04-15 14:10:08.071 [warning] <0.583.0> Ranch acceptor reducing accept rate: out of file descriptors 2020-04-15 14:10:08.073 [error] <0.19158.1138> CRASH REPORT Process <0.19158.1138> with 0 neighbours exited with reason: bad argument in call to ets:lookup(rabbit_exchange, {resource,<<"/rsa/system">>,exchange,<<"carlos.sms.collectd">>}) in rabbit_misc:dirty_read/1 line 395 2020-04-15 14:10:08.073 [error] <0.2635.0> Supervisor {<0.2635.0>,rabbit_federation_link_sup} had child {upstream,[<<"amqps://153.7.72.225:5671?auth_mechanism=external">>], <<"carlos.sms.collectd">>,<<"carlos.sms.collectd">>,1000,1,5, 3600000,none,false,'on-confirm',none, <<"carlos-upstream-698a3d8d-ba3e-4a93-a25c-b1185a966e86">>,false} started with rabbit_federation_exchange_link:start_link({{upstream,[<<"amqps://153.7.72.225:5671?auth_mechanism=external">>], <<"carlos.sms.collectd">>,...},...}) at {restarting,<0.8430.1050>} exit with reason bad argument in call to ets:lookup(rabbit_exchange, {resource,<<"/rsa/system">>,exchange,<<"carlos.sms.collectd">>}) in rabbit_misc:dirty_read/1 line 395 in context start_error 2020-04-15 14:10:08.081 [warning] <0.599.0> Ranch acceptor reducing accept rate: out of file descriptors 2020-03-02 17:19:46.106 [error] <0.19709.3856> CRASH REPORT Process <0.19709.3856> with 0 neighbours exited with reason: {aborted,{no_exists,[rabbit_runtime_parameters,cluster_name]}} in mnesia:abort/1 line 355 2020-03-02 17:19:46.106 [error] <0.15120.3869> Supervisor {<0.15120.3869>,rabbit_connection_sup} had child reader started with rabbit_reader:start_link(<0.17481.3872>, {acceptor,{0,0,0,0,0,0,0,0},5672}) at <0.19709.3856> exit with reason {aborted,{no_exists,[rabbit_runtime_parameters,cluster_name]}} in context child_terminated 2020-03-02 17:19:46.106 [error] <0.15120.3869> Supervisor {<0.15120.3869>,rabbit_connection_sup} had child reader started with rabbit_reader:start_link(<0.17481.3872>, {acceptor,{0,0,0,0,0,0,0,0},5672}) at <0.19709.3856> exit with reason reached_max_restart_intensity in context shutdown 2020-03-02 17:19:46.156 [error] <0.4268.3859> CRASH REPORT Process <0.4268.3859> with 0 neighbours exited with reason: bad argument in call to ets:lookup(rabbit_exchange, {resource,<<"/rsa/system">>,exchange,<<"carlos.alerts">>}) in rabbit_misc:dirty_read/1 line 395 2020-03-02 17:19:46.157 [error] <0.455.0> Supervisor {<0.455.0>,rabbit_federation_link_sup} had child {upstream,[<<"amqps://172.19.108.192:5671?auth_mechanism=external">>], <<"carlos.alerts">>,<<"carlos.alerts">>,1000,1,5,3600000,none,false, 'on-confirm',none, <<"carlos-upstream-d40020aa-9396-4412-bde2-58f863530e9d">>,false} started with rabbit_federation_exchange_link:start_link({{upstream,[<<"amqps://172.19.108.192:5671?auth_mechanism=external">>],<<"carlos.alerts">>,<<"car...">>,...},...}) at {restarting,<0.10709.1780>} exit with reason bad argument in call to ets:lookup(rabbit_exchange, {resource,<<"/rsa/system">>,exchange,<<"carlos.alerts">>}) in rabbit_misc:dirty_read/1 line 395 in context start_error The following messages may be found in /var/log/rabbitmq/log/crash.log: 2020-04-15 14:15:58 =CRASH REPORT==== crasher: initial call: amqp_gen_connection:init/1 pid: <0.22077.1048> registered_name: [] exception error: {function_clause,[{amqp_gen_connection,terminate,[{shutdown,{gen_server2,call,[file_handle_cache,{obtain,1,socket,<0.22077.1048>},infinity]}},{<0.23240.1048>,{amqp_params_network,<<"guest">>,<<"guest">>,<<"/rsa/system">>,"10.224.254.214",5671,2047,0,10,60000,[],[#Fun<amqp_uri.12.79294410>],[{<<"connection_name">>,longstr,<<"Federation link (upstream: carlos-upstream-93ec817a-188c-41cd-b66a-cb370f023615, policy: carlos-federate)">>}],[]}}],[{file,"src/amqp_gen_connection.erl"},{line,239}]},{gen_server,try_terminate,3,[{file,"gen_server.erl"},{line,673}]},{gen_server,terminate,10,[{file,"gen_server.erl"},{line,858}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]} ancestors: [<0.23914.1048>,amqp_sup,<0.259.0>] message_queue_len: 0 messages: [] links: [<0.23914.1048>] dictionary: [] trap_exit: true status: running heap_size: 1598 stack_size: 27 reductions: 1097 neighbours: 2020-04-15 19:46:33 =SUPERVISOR REPORT==== Supervisor: {<0.20717.45>,amqp_channel_sup_sup} Context: shutdown_error Reason: shutdown Offender: [{nb_children,1},{name,channel_sup},{mfargs,{amqp_channel_sup,start_link,[network,<0.20634.45>,<<"client 153.7.72.222:47578 -> 10.95.222.3:5671">>]}},{restart_type,temporary},{shutdown,infinity},{child_type,supervisor}] 2020-04-15 20:06:47 =SUPERVISOR REPORT==== Supervisor: {<0.12850.48>,amqp_channel_sup_sup} Context: shutdown_error Reason: shutdown Offender: [{nb_children,1},{name,channel_sup},{mfargs,{amqp_channel_sup,start_link,[network,<0.12775.48>,<<"client 153.7.72.222:36501 -> 10.95.222.6:5671">>]}},{restart_type,temporary},{shutdown,infinity},{child_type,supervisor}] 2020-03-31 14:14:56 =CRASH REPORT==== crasher: initial call: rabbit_federation_exchange_link:init/1 pid: <0.28993.2084> registered_name: [] exception exit: {{badarg,[{ets,lookup,[rabbit_exchange,{resource,<<"/rsa/system">>,exchange,<<"carlos.audit">>}],[]},{rabbit_misc,dirty_read,1,[{file,"src/rabbit_misc.erl"},{line,395}]},{rabbit_federation_exchange_link,init,1,[{file,"src/rabbit_federation_exchange_link.erl"},{line,76}]},{gen_server2,init_it,6,[{file,"src/gen_server2.erl"},{line,554}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]},[{gen_server2,init_it,6,[{file,"src/gen_server2.erl"},{line,597}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]} ancestors: [<0.678.0>,<0.413.0>,rabbit_federation_exchange_link_sup_sup,rabbit_federation_sup,rabbit_sup,<0.287.0>] message_queue_len: 0 messages: [] links: [<0.678.0>] dictionary: [] trap_exit: false status: running heap_size: 610 stack_size: 27 reductions: 241 neighbours: 2020-03-31 14:14:56 =SUPERVISOR REPORT==== Supervisor: {<0.678.0>,rabbit_federation_link_sup} Context: start_error Reason: {badarg,[{ets,lookup,[rabbit_exchange,{resource,<<"/rsa/system">>,exchange,<<"carlos.audit">>}],[]},{rabbit_misc,dirty_read,1,[{file,"src/rabbit_misc.erl"},{line,395}]},{rabbit_federation_exchange_link,init,1,[{file,"src/rabbit_federation_exchange_link.erl"},{line,76}]},{gen_server2,init_it,6,[{file,"src/gen_server2.erl"},{line,554}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]} Offender: [{pid,{restarting,<0.13111.682>}},{name,{upstream,[<<"amqps://10.100.6.20:5671?auth_mechanism=external">>],<<"carlos.audit">>,<<"carlos.audit">>,1000,1,5,3600000,none,false,'on-confirm',none,<<"carlos-upstream-14a5aeef-6a2b-4918-a048-97abea48151a">>,false}},{mfargs,{rabbit_federation_exchange_link,start_link,[{{upstream,[<<"amqps://10.100.6.20:5671?auth_mechanism=external">>],<<"carlos.audit">>,<<"carlos.audit">>,1000,1,5,3600000,none,false,'on-confirm',none,<<"carlos-upstream-14a5aeef-6a2b-4918-a048-97abea48151a">>,false},{resource,<<"/rsa/system">>,exchange,<<"carlos.audit">>}}]}},{restart_type,{permanent,5}},{shutdown,30000},{child_type,worker}] 2020-03-31 14:14:56 =CRASH REPORT==== crasher: initial call: rabbit_federation_exchange_link:init/1 pid: <0.25092.2090> registered_name: [] exception exit: {{badarg,[{ets,lookup,[rabbit_exchange,{resource,<<"/rsa/system">>,exchange,<<"carlos.alerts">>}],[]},{rabbit_misc,dirty_read,1,[{file,"src/rabbit_misc.erl"},{line,395}]},{rabbit_federation_exchange_link,init,1,[{file,"src/rabbit_federation_exchange_link.erl"},{line,76}]},{gen_server2,init_it,6,[{file,"src/gen_server2.erl"},{line,554}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]},[{gen_server2,init_it,6,[{file,"src/gen_server2.erl"},{line,597}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]} ancestors: [<0.498.0>,<0.413.0>,rabbit_federation_exchange_link_sup_sup,rabbit_federation_sup,rabbit_sup,<0.287.0>] message_queue_len: 0 messages: [] links: [<0.498.0>] dictionary: [] trap_exit: false status: running heap_size: 610 stack_size: 27 reductions: 241 neighbours: 2020-03-31 14:14:56 =SUPERVISOR REPORT==== Supervisor: {<0.498.0>,rabbit_federation_link_sup} Context: start_error Reason: {badarg,[{ets,lookup,[rabbit_exchange,{resource,<<"/rsa/system">>,exchange,<<"carlos.alerts">>}],[]},{rabbit_misc,dirty_read,1,[{file,"src/rabbit_misc.erl"},{line,395}]},{rabbit_federation_exchange_link,init,1,[{file,"src/rabbit_federation_exchange_link.erl"},{line,76}]},{gen_server2,init_it,6,[{file,"src/gen_server2.erl"},{line,554}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]} Offender: [{pid,{restarting,<0.12183.682>}},{name,{upstream,[<<"amqps://10.100.217.26:5671?auth_mechanism=external">>],<<"carlos.alerts">>,<<"carlos.alerts">>,1000,1,5,3600000,none,false,'on-confirm',none,<<"carlos-upstream-5d4a0f18-4c24-4a17-8f3d-759d96cf4e50">>,false}},{mfargs,{rabbit_federation_exchange_link,start_link,[{{upstream,[<<"amqps://10.100.217.26:5671?auth_mechanism=external">>],<<"carlos.alerts">>,<<"carlos.alerts">>,1000,1,5,3600000,none,false,'on-confirm',none,<<"carlos-upstream-5d4a0f18-4c24-4a17-8f3d-759d96cf4e50">>,false},{resource,<<"/rsa/system">>,exchange,<<"carlos.alerts">>}}]}},{restart_type,{permanent,5}},{shutdown,30000},{child_type,worker}] 2020-03-02 17:19:46 =SUPERVISOR REPORT==== Supervisor: {<0.31133.3865>,rabbit_connection_sup} Context: shutdown Reason: reached_max_restart_intensity Offender: [{pid,<0.23708.3863>},{name,reader},{mfargs,{rabbit_reader,start_link,[<0.7447.3872>,{acceptor,{0,0,0,0,0,0,0,0},5672}]}},{restart_type,intrinsic},{shutdown,30000},{child_type,worker}] 2020-03-02 17:19:46 =CRASH REPORT==== crasher: initial call: rabbit_federation_exchange_link:init/1 pid: <0.23253.3872> registered_name: [] exception exit: {{badarg,[{ets,lookup,[rabbit_exchange,{resource,<<"/rsa/system">>,exchange,<<"carlos.alerts">>}],[]},{rabbit_misc,dirty_read,1,[{file,"src/rabbit_misc.erl"},{line,395}]},{rabbit_federation_exchange_link,init,1,[{file,"src/rabbit_federation_exchange_link.erl"},{line,76}]},{gen_server2,init_it,6,[{file,"src/gen_server2.erl"},{line,554}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]},[{gen_server2,init_it,6,[{file,"src/gen_server2.erl"},{line,597}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]} ancestors: [<0.455.0>,<0.385.0>,rabbit_federation_exchange_link_sup_sup,rabbit_federation_sup,rabbit_sup,<0.274.0>] message_queue_len: 0 messages: [] links: [<0.455.0>] dictionary: [] trap_exit: false status: running heap_size: 610 stack_size: 27 reductions: 241 neighbours: |
Cause | The RabbitMQ service runs out of file descriptors and brings down the node; however, the RabbitMQ service may remain running. While in this state, RabbitMQ stops processing new messages, but may not produce a crash dump. Instead, the service is no longer able to perform processing functions. |
Resolution | The fix for this issue will be to upgrade to either RSA NetWitness Platform 11.4.1.2 or 11.5.0, once those versions are available. |
Workaround | Until the official versions are released, a workaround for this issue is available as a download attached to this article (rabbitmq-performance-master.zip). Note: This script attempts to access the following servers using the REST interface ports: Archiver, Broker, Concentrator, Network/Log Decoder, Endpoint Hybrid, Network/Log Hybrid, VLC, and Malware. This means that the REST interface ports must be accessible to the NW Admin server for this script to function correctly, see the Deployment Guide: Network Architecture and Ports for more information about the REST ports. If the REST interface ports are not open between the NW Admin server and the other RSA NetWitness appliances, see the Manual Change Adjustment method later in this document. Automated REST Adjustment
Note: A log file for the script will be created in the same directory where the script is run from. Debugging can be enabled by going into the script on line 39 and changing the logging.INFO to logging.DEBUG.
If there are issues using the automated script, see the Manual Change Adjustment section below. Manual Change AdjustmentIf the fix for this issue cannot be performed using the automated script or there are special circumstances that prohibit the script's usage, it is possible to manually perform the changes on the services.
If there are issues with the process above, contact RSA NetWitness Support for further assistance. |