Eric Partington

Script - Using NwConsole to hunt query performance with Topquery

Blog Post created by Eric Partington Employee on Aug 18, 2017

10.6 introduced a new CLI feature in NwConsole that allowed reporting with topquery.  This neat command allows the parsing of the /var/log/messages file for query commands on brokers, concentrators and archivers to report on query performance.  The is a very helpful command to dig into general analyst performance (best practices, poor syntax, optimization opportunities etc.) and hunt down slowness in the NetWitness Suite.

 

CLI: Commands used for Troubleshooting 

Command Line Interface for version 10.6.4 

 

This is a sample line from the output of topquery

 

  1. # Dec  3 13:43:46 loki NwConcentrator[15854]: [SDK-Values] [audit] User admin (session 54557, 10.105.45.109:49552) has f  
  2. inished values (channel 55739, queued 00:00:00, execute 00:25:13): fieldName=event.cat.name id1=491506493860 id2=7516691  
  3. 19298 threshold=100000 size=20 flags=sessions,sort-total,order-descending,ignore-cache where="time=\"2015-12-03 11:39:00  
  4. \"-\"2015-12-03 17:38:59\""  
  5. /sdk values fieldName=event.cat.name id1=491506493860 id2=751669119298 threshold=100000 size=20 flags=sessions,sort-tota  
  6. l,order-descending,ignore-cache where="time=\"2015-12-03 11:39:00\"-\"2015-12-03 17:38:59\""  

 

Each query is represented twice (the second starts at /sdk).

  • You can see the user that ran the query 'User admin'.  Depending on where the query was run from this may be the actual user logged in (investigation drills) or it may be the service account that was used to connect the RE service to all the components (and RE rules or ESA rules).
  • You are able to see how long the query ran for (execute) and how long it might have waited to execute (queued)
  • You can see the metakey that was use to drill onto in this case from investigator - navigate it was on event.cat.name
  • the timeframe for the query is also displayed
  • the ordering of the data is also shown (order-descending)
  • the default size of 20 values is also used to load on the drill ( a give away that this is the investigate view and default settings)

 

There are lots of options to topquery and it can be very handy to run this during monthly reviews of the platform to review query operations and make sure bad habits arent creeping into the analyst workflows.  One to be especially careful of is Investigate - advanced queries where the times are taking longer than normal to execute - check for queries on non index-values keys as this can create a major performance hit on the system.

 

Here is the help menu output from NwConsole - Topquery

  1. Usage: topQuery [days=#] [hours=#] [time1=<YYYY-MM-DD HH:MM:SS>]  
  2.                 [time2=<YYYY-MM-DD HH:MM:SS>] [user=<username>] [top=N]  
  3.                 [match=<values,query,timeline>] [input=<pathname>]  
  4.                 [delimiter=","] [append=<0,1>] [dateRegex=<regex>]  
  5.                 [regex=<regex>] [show] [distribution] [canceled]  
  6. Returns the top N longest running queries from the audit log (either a file or  
  7. from the log API)  
  8.    
  9.     days         - Indicates the time range to query logs.  hours/days is how  
  10.                    far back from NOW.  
  11.     hours        - Indicates the time range to query logs.  hours/days is how  
  12.                    far back from NOW.  
  13.     time1        - Indicates the starting time range to query logs  
  14.     time2        - Indicates the ending time range to query logs  
  15.     user         - The user who submitted the query, by default searches all  
  16.                    users, use (admin|user1|user2) for multiple  
  17.     top          - The top N queries to display, by default shows the longest  
  18.                    executing 100 queries in the time range  
  19.     match        - The types of queries to match, default is:  
  20.                    values,query,timeline  
  21.     input        - Parse a log file for the queries  
  22.     output       - The optional output path where the logs will be saved,  
  23.                    otherwise logs are written to console  
  24.     append       - If 1, will append to existing file, zero overwrites  
  25.                    (default)  
  26.     dateRegex    - If passed in, this will be used to parse the date (syslog  
  27.                    only) instead of the default  
  28.     regex        - If passed in, this will be used to parse the complete log  
  29.                    instead of the default  
  30.     show         - Shows all the regex expressions that are used by default  
  31.     distribution - Groups query execution times into the provided distribution.  
  32.                    Must be a comma separated list of increasing seconds.  
  33.                    Example: distribution=10,20,30,60,300  
  34.     canceled     - If true, it will analyze canceled queries instead of queries  
  35.                    that finished  

 

Want to see who is downloading pcap files from the decoders and how long it took?

  1. [loki.netwitness.local:50005] /> topquery match=packets top=10  
  2. # 177682836    audit    2016-Feb-19 16:15:20    SDK-Packets    User administrator (session 216639, 10.105.33.127:65256) has finished packets (channel 216675, queued 00:00:00, execute 00:00:01): sessions=26197520000-26197520010 op=start  
  3. /sdk packets sessions=26197520000-26197520010 op=start  
  4.   
  5. # 177682848    audit    2016-Feb-19 16:15:37    SDK-Packets    User administrator (session 216639, 10.105.33.127:65256) has finished packets (channel 216699, queued 00:00:00, execute 00:00:00): sessions=26197520015-26197520020 op=start  
  6. /sdk packets sessions=26197520015-26197520020 op=start  
  7.   
  8. 2 queries were analyzed that match the specified criteria  
  9. 2 (100.0%) queries executed <= 5 seconds  
  10. 0 (0.0%) queries executed <= 10 seconds  
  11. 0 (0.0%) queries executed <= 20 seconds  
  12. 0 (0.0%) queries executed <= 30 seconds  
  13. 0 (0.0%) queries executed <= 60 seconds  
  14. 0 (0.0%) queries executed <= 120 seconds  
  15. 0 (0.0%) queries executed <= 300 seconds  
  16. 0 (0.0%) queries executed <= 600 seconds  
  17. 0 (0.0%) queries executed <= 1200 seconds  
  18. 0 (0.0%) queries executed <= 3600 seconds  
  19. 0 (0.0%) queries executed > 3600 seconds  

 

Want to change the distribution of the time buckets for the query results to something other than default?

  1. > login loki.netwitness.local:50005 administrator  
  2.    
  3. Password: *******************  
  4. Successfully logged in as session 12102   
  5. [loki.netwitness.local:50005] /> topQuery days=7 top=2 distribution=20,40,60,300  
  6.    
  7. # 153347016     audit   2015-Dec-16 14:44:01    SDK-Query       User admin (session 4992, 10.25.55.251:38071) has finish  
  8. ed query (channel 6966, queued 00:00:00, execute 00:11:40): id1=492650373414 id2=663447669617 size=0 flags=0 threshold=0  
  9. query="select * "  
  10. /sdk query id1=492650373414 id2=663447669617 size=0 flags=0 threshold=0 query="select * "  
  11.    
  12. # 152612053     audit   2015-Dec-16 13:32:12    SDK-Values      User admin (session 4774, 137.69.131.57:53254) has finis  
  13. hed values (channel 4841, queued 00:00:00, execute 00:09:54): time2=1450288379 time1=1450266780 size=100000 fieldName=ip  
  14. .src flags=packets  
  15. /sdk values time2=1450288379 time1=1450266780 size=100000 fieldName=ip.src flags=packets  
  16.    
  17.    
  18. 2284 queries were analyzed that match the specified criteria  
  19. 2197 queries executed <= 20 seconds  
  20. 17 queries executed <= 40 seconds  
  21. 22 queries executed <= 60 seconds  
  22. 39 queries executed <= 300 seconds  
  23. 9 queries executed > 300 seconds  

 

Scripting for simplicity and efficiency

The CLI method is perfect for working on an individual system but requesting the information from multiple systems can be a manual process to connect to all systems and pull back data.  So a script was written to help with this process, actually 2 scripts are provided below.

rsa-nwconsole-topquery-parser.py

provides the commands above in a script and prompts for the username, password, host and port interactively to make it easier to remember how to run the commands and then outputs the results to a folder on the system where teh script is run.  Generally running this from the NW/SA server head makes the most sense to run this centrally.

rsa-nwconsole-topquery-chopper.py

this provides the same functions as the above script but reads from csv where you can enter multiple systems to query.  it also chops up the output and provides more details around each query to help you determine where the query comes from in the system.  Queries can come from multiple locations and the only way to determine where they came from was to look at the format and syntax to deduce where the source is.  Potential sources can be Investigation - navigate - drill, basic query, advanced query, events view, RE > alert, RE > chart > RE > report, ESA, Packet Extraction, timeline load.

[root@sa rsa-nwconsole-topquery-chopper]# ./rsa-nwconsole-topquery-chopper.py devices-rsa.csv  

 

  1. 2017-Aug-14 17:44:18 - INFO: Skipping input file row marked as #comment  
  2. 2017-Aug-14 17:44:18 - INFO: Skipping input file row marked as #comment  
  3. 2017-Aug-14 17:44:18 - INFO: Skipping input file row marked as #comment  
  4. 2017-Aug-14 17:44:18 - SCRIPT VERSION: 0.1.11  
  5. 2017-Aug-14 17:44:18 - INFO: Processing concentrator device [concentrator6] at http://concentrator6:50105/  
  6. # NwConsole Communication Error Returned: None  
  7. # Concentrator: Expect SDK-Query from Reporting Engine (Reports) or SDK-Values from Reporting Engine(Alerts,Charts) or SDK-Values from Investigator/Events  
  8. 407929 queries were analyzed that match the specified criteria  
  9. 407817 (100.0%) queries executed <= 5 seconds  
  10. 55 (0.0%) queries executed <= 10 seconds  
  11. 28 (0.0%) queries executed <= 20 seconds  
  12. 12 (0.0%) queries executed <= 30 seconds  
  13. 12 (0.0%) queries executed <= 60 seconds  
  14. 5 (0.0%) queries executed <= 120 seconds  
  15. 0 (0.0%) queries executed <= 300 seconds  
  16. 0 (0.0%) queries executed <= 600 seconds  
  17. 0 (0.0%) queries executed <= 1200 seconds  
  18. 0 (0.0%) queries executed <= 3600 seconds  
  19. 0 (0.0%) queries executed > 3600 seconds  
  20.   
  21.   
  22. RSA Security Analytics Console 10.6.2.2  
  23. Copyright 2001-2017, RSA Security Inc.  All Rights Reserved.  
  24.   
  25.   
  26. >login concentrator6:50005 admin PASSWORD  
  27. Successfully logged in as session 1158004  
  28. >topquery top=100 days=100  
  29. # Source: HEADER  
  30.  10167785       audit   2017-Jul-11 01:13:36    SDK-Values      User admin (session 707083, 192.168.254.140:46294) has finished values (channel 707115, queued 00:00:00, execute 00:01:42): fieldName=alias.host id1=2718729393 id2=6993289982 threshold=100000 size=20 flags=sessions,sort-total,order-descending,ignore-cache where="time=\"2016-02-29 02:44:00\"-\"2017-07-11 01:11:59\""  
  31. # Source: INVESTIGATION - navigate -  metakey drill  
  32. # Query: time=\"2016-02-29 02:44:00\"-\"2017-07-11 01:11:59\  
  33. # Size of results (max returned elements - 0 is unbounded): 20  
  34. # Drill occured on Metakey: alias.host  
  35.  8919357        audit   2017-May-08 14:35:57    SDK-Query       User admin (session 245003, 192.168.254.140:34254) has finished query (channel 245012, queued 00:00:00, execute 00:01:36): id1=1562949600 id2=5911144883 threshold=0 query="select countdistinct(alias.host), avg(alias.host.len), avg(alias.host.ent), countdistinct(ip.src) where (time='2017-Apr-24 14:34:00'-'2017-May-08 14:33:59') && (direction='outbound' && service = 53 && alias.host.root != 'in-addr.arpa','akamaiedge.net','akadns.net','akamai.net','mcafee.com','akamaihd.net') group by alias.host.root order by countdistinct(alias.host) DESC "  
  36. # Source: REPORTING_ENGINE (Report,Chart)  
  37. # Query:  (time='2017-Apr-24 14:34:00'-'2017-May-08 14:33:59') && (direction='outbound' && service = 53 && alias.host.root != 'in-addr.arpa','akamaiedge.net','akadns.net','akamai.net','mcafee.com','akamaihd.net') group by alias.host.root order by countdistinct(alias.host) DESC  
  38.  8919356        audit   2017-May-08 14:35:57    SDK-Query       User admin (session 244975, 192.168.254.140:55558) has finished query (channel 244993, queued 00:00:00, execute 00:01:36): id1=1562949600 id2=5911144883 threshold=0 query="select count(alias.host), countdistinct(ip.src) where (time='2017-Apr-24 14:34:00'-'2017-May-08 14:33:59') && (direction = 'outbound' && service = 53 && alias.host.root != 'in-addr.arpa','akamaiedge.net','akadns.net','akamai.net','mcafee.com','akamaihd.net') group by alias.host,ip.src order by count(alias.host) DESC "  
  39. # Source: REPORTING_ENGINE (Report,Chart)  
  40. # Query:  (time='2017-Apr-24 14:34:00'-'2017-May-08 14:33:59') && (direction = 'outbound' && service = 53 && alias.host.root != 'in-addr.arpa','akamaiedge.net','akadns.net','akamai.net','mcafee.com','akamaihd.net') group by alias.host,ip.src order by count(alias.host) DESC  
  41.  9711797        audit   2017-Jun-16 00:29:15    SDK-Values      User admin (session 21639, 192.168.254.140:51716) has finished values (channel 21810, queued 00:00:00, execute 00:01:15): fieldName=streams id1=2110980282 id2=6419455724 threshold=100000 size=20 flags=sessions,sort-total,order-descending,ignore-cache where="time=\"2017-06-15 14:22:00\"-\"2017-06-15 20:21:59\""  
  42. # Source: INVESTIGATION - navigate -  metakey drill  
  43. # Query: time=\"2017-06-15 14:22:00\"-\"2017-06-15 20:21:59\  
  44. # Size of results (max returned elements - 0 is unbounded): 20  
  45. # Drill occured on Metakey: streams  
  46.  9711796        audit   2017-Jun-16 00:29:13    SDK-Values      User admin (session 21639, 192.168.254.140:51716) has finished values (channel 21790, queued 00:00:00, execute 00:01:13): fieldName=session.split id1=2110980282 id2=6419455724 threshold=100000 size=20 flags=sessions,sort-total,order-descending,ignore-cache where="time=\"2017-06-15 14:22:00\"-\"2017-06-15 20:21:59\""  
  47. # Source: INVESTIGATION - navigate -  metakey drill  
  48. # Query: time=\"2017-06-15 14:22:00\"-\"2017-06-15 20:21:59\  
  49. # Size of results (max returned elements - 0 is unbounded): 20  
  50. # Drill occured on Metakey: session.split  

Outcomes