I call the search parser the 'fight club' parser, because if no one is supposed to talk about fight club, how can you tell if someone does indeed talk about fight club? And what if someone talks about fight club over chat, email, ftp, web, proxy, SQL query or telnet?
Decoders have a default search parser installed which allows for matches on regular expression strings and keywords within any clear-text protocol. The search parser is not like a regular parser- this parser actually searches the entire byte stream for its matches, whereas standard parsers use advanced logic and tokens for matches.
By default the search parser identifies credit cards, SSNs and Employee Identification Numbers. But you can easily add your own terms and phrases.
Your 'fight club' could be anything that personnel on your network aren't allowed to talk about. Any keywords or phrases can be used as static phrases, and any reg-ex pattern can be used alone or in conjunction with keywords to detect variable data like ProductID numbers or UserIDs.
Some instances I've seen in the field include:
- Former employees recruiting current employees
- Leaks to Wall Street Journal and WashingtonPost reporters
- Upcoming news of IPO offerings
- Insider information about an ongoing SEC investigation
- Upcoming release of new product brands and their code names
- Information about scandals involving corporate officers
- Classified Corporate information pertaining to e-discovery and litigation
The search parser is more resource intensive than any other parser. It can take as much processing time as all of the other parsers combined since the search parser examines every session for matches. However, most modern appliances have the processing power to handle the search processor without much of an impact on performance.
To configure your search parser, the Search parser must be enabled on the decoders.
You should also make sure your concentrators and brokers are configured to index the values of the matches and found keys, as this makes the queries and results faster. You accomplish this by editing the index-concentrator.xml and index-broker.xml files on the respective appliances. Change the index parameter from IndexNone or IndexKeys to IndexValues.
IMPORTANT NOTE: Any changes to the index will require a service restart for the indexing changes to take effect.
Next, you should edit the search.ini file on your decoders to include your keywords, phrases and service types you want to examine. The searh.ini file is fully documented to take advantage of case sensitivity. If you need help coming up with regular expressions for variables you want to match against, Google is a great resource.
After you have your search.ini edited, save it and push it to your other decoders and then reload the parsers. The found key will begin populating with your categories (represented within the brackets) and the match key will populate with the values that matched your search parameters.
If you get too many hits, you may need to refine your reg-ex or search terms to more accurately pinpoint what you hope to match. Thousands of hits of any value becomes too cumbersome for analysts to have to investigate. (I once tried to match on British drivers license numbers using a reg-ex I found on Google and got ridiculous number of hits. I had to disable that search string as it just wasn't effective)
As said above, the search parser is processor and memory intensive. On older gear running near gigabit throughput, enabling the search parser can cause packet loss. On newer gear with more power to handle the searches, some packet loss may occur if throughput is high and other packet extraction processes are running, such as Informer Visualizations or Spectrum File Analysis. Keep an eye on your packet drop stats for a few days after enabling the search parser to ensure you aren't experiencing any performance degradation.