RSA Admin

Parsing content that contains a lot of taboo characters

Discussion created by RSA Admin Employee on Aug 29, 2008

Occasionally, we run across flat files that are essentially logs in an XML or HTML-like format.  These will contain a LOT of those nasty < and > characters that we know we can't code up literally.  Here's a snippet of such a line I was given recently:

 

<state>TX</state><city>Plano</city>
 

The goal was to parse this such that the string "TX" is read into a variable named "state", and the string "Plano" is read into a second variable called "city".  Everything else is to be treated as fixed text.  So, how would you code that up in the content parameter?

 

Here's the solution:

&lt;&lt;state&gt;&lt;state&gt;&lt;&lt;/state&gt;&lt;&lt;city&gt;&lt;city&gt;&lt;&lt;/city&gt;

 

The parts highlighted in blue are the fixed text, and the parts highlighted in green are the actual variables. 

 

The key is to "escape" the left bracket (< ) characters by doubling up the ones you want to treat as fixed text.  Refer to your UDS class slides for the details on escaping the < and > brackets.

Message Edited by MattMarchand on08-29-200812:59 PM

Outcomes