Many customers provide comma separated value (CSV) files for collection of their identity, account, entitlement, and role data. RSA Identity Governance & Lifecycle leverages a type 4 JDBC driver called HXTT Text (CSV) Pure Java JDBC Driver to be able to collect data from CSV files as if the data in the files were in database tables. This has proven to be a very powerful tool for manipulating data coming from a CSV file or even joining data across multiple CSV files. However, there have been issues using this driver to collect data due to some of the SQL issues associated with using a SQL language to access data in a text file vs. a true relational database. Some issues are due to the fact that a text file does not have metadata around the columns of data within the file so declared datatypes for the data are needed. There are also issues around performance; particularly when joining across files or doing complex SQL due to the fact that a CSV file is not a relational database and is not going to perform as one.
It is highly recommended to test queries in collectors against the data source before running collectors. When the data source is a relational database such as Oracle or Microsoft SQL Server, the collector SQL queries can be testing using a tool such as SQL*Plus or SQL Developer. When the collector is an LDAP collector, an LDAP browser can be used to test the LDAP filters in the collector definitions. When using a text-based CSV database driver, one way to test the queries in the collector definition is to use a tool that supports using any JDBC type 4 driver to query the data source (CSV file.) Another option, described in this RSA Knowledge Base Article, is to use the HXTT Text (CSV) Pure Java JDBC Driver test tool built into the JDBC driver.
In this RSA Knowledge Base Article, we are referring to a trial version of the HXTT Text (CSV) Pure Java JDBC Driver. This version has limited functionality but is useful for testing SQL queries from a desktop computer. For supported SQL syntax documentation, license restrictions, or information on fully functional licensed versions, please refer to the official HXTT site.
Requirements to run the HXTT testing tool
The testing tool is simply a Java program with a user interface for typing in SQL queries and viewing the data results of that query obtained using the HXTT driver. To run the the tool you will need:
A machine on which you can see a JAVA GUI. This test tool has a graphical interface so you either need to run it on a Windows machine or under X Windows if you are on Linux
Java JRE 1.5 or later
The HXTT JDBC type 4 driver jar
The limited trial version can be downloaded from the HXTT website. When you clink on the link, the zip file will automatically download to your Windows Download folder.
Refer to the HXTT site for detailed documentation and license restrictions.
The CSV text files from which you want to collect data
Be sure that these files have gone through dos2unix if you are going to run the tool on an appliance/sandbox or have gone through unix2dos if you are going to run them on a Windows machine.
Setting up and using the HXTT testing tool
NOTE: This example is on Windows.
Make sure your CSV files are in the folder referred to by the JDBC URL. For this example, which is on Windows, place the CSV file(s) in your temp directory (C:\temp).
Remember to ensure you run unix2dos on the CSV file (if you are on Windows) or dos2unix (if you are on Linux) just to make sure the file is in the correct format. In the example here, there are four CSV files saved under C:\temp which are also on the appliance. These files are:
From a command line run prompt, run the following command from the same directory where the Text_JDBC30.jar file is located. Note, this assumes java is in your path:
java -jar Text_JDBC30.jar
If the .jar extension is associated with the java executable, you may simply need to double click on Text_JDBC30.jar.
A GUI will appear as shown in the screenshot below:
Type in any name in theURL Namefield. In this example we called it My Temp Directory. This field indicates the location of where the CSV files will be stored. In this case they will be stored on Windows in C:\temp.
In the URL Info field, type in the appropriate URL for the JDBC driver. This is going to be very much like the JDBC URL used in the collector. If you are running this tool from a Linux box, use the exact JDBC URL that is used in the collector. For example, the JDBC URL of:
means that the driver will look in the /home/oracle/database/SampleData/Demo directory for any CSV files.
On Windows, set the URL based on a local directory where the CSV files live. For example, if using C:\temp as in this example, the JDBC URL would look like:
In either case, be sure to set the _CSV_Header param to true or false depending on whether the CSV files have a header row in them or not.
Press the Add Config button to add this configuration to the list of server configurations on the left.
Select your configuration (e. g., My Temp Directory) and select View Monitor. The following window displays:
Click on Server Catalogs. You will now see:
Click the Build Conn button. The Build Conn button makes a quasi connection to the folder.
Type in any SQL query in the white box under the Build Conn button and press Execute. For example, run any of the individual Entitlement Data Collector (EDC) queries to see what they return. The execute button runs the query and displays the result in the window below. Essentially the query in the white box is executed and the result set is shown in the box below the query. The following is an example of a query that could be configured in an EDC:
If you are getting a parse error similar to the error below and believe you are using valid SQL syntax according to the HXTT documentation, it is suggested to look at the SQL statement in a HEX editor to look for invalid characters.
There are additional URL parameters available to specify different CSV file properties such as a different column separator other than a comma (_CSV_Separator). Refer to the Advanced Programming documentation from HXTT.