Add Warehouse as a Data Source to Reporting Engine

Document created by RSA Information Design and Development on Jul 29, 2016
Version 1Show Document
  • View in full screen mode

This topic provides instructions on how to:

  • Add a Warehouse Data Source to Reporting Engine
  • Set Warehouse Data Source as the Default Source


Make sure that:

  • Hive server is in running state on all the Warehouse nodes. You can use the following command to check the status of the hive server:
    status hive2 (MapR deployments)
    service hive-server2 status (Pivotal HD deployments)
  • Warehouse Connector is configured to write data to the warehouse deployments.
  • If the kerberos authentication is enabled for HiveServer2, make sure that the keytab file is copied to /home/rsasoc/rsa/soc/reporting-engine/conf/ directory in the Reporting Engine Host.

    Note: Make sure that the rsasoc user role has read permissions to read the keytab file.

    Also, make sure that you update the keytab file location in the Kerberos Keytab File parameter in the Reporting Engine Service Config View as shown below.
  • The default Kerberos configuration file is located at, /etc/kbr5.conf in the Reporting Engine. You can modify the configuration file to provide details for Kerberos realms and other parameters related to Kerberos.
  • Added the host name (or FQDN) and IP address of the Pivotal nodes and Warehouse Connector to the DNS server.  If the DNS server is not configured the add the host name (or FQDN) and IP address of the Pivotal nodes and Warehouse Connector to the /etc/hosts file in the host on which the Warehouse Connector service is installed.

Perform the following steps to associate a Warehouse data source with Reporting Engine:

  1. In the Security Analytics menu, select Administration > Services.
  2. In the Services grid, select the Reporting Engine service.
  3. Click settings.png> View> Config.
  4. Select the Sources tab.
    The Service Config View is displayed with the Reporting Engine Sources tab open.
  5. Click add_icon.png and select New Service.
    The New Service page is displayed, as shown below.
  6. In the Source Type drop-down menu, select Warehouse.
  7. In the Warehouse Source drop-down menu, select the warehouse data source. 
  8. In the Name field, enter the name of the Warehouse data source.
  9. In the HDFS Path field, enter the HDFS root path to which the Warehouse Connector writes the data.
    For example:
    If /saw is the local mount point for HDFS that you have configured while mounting NFS on the device where you have installed the Warehouse Connector service to write to SAW , for more information, see Mount the Warehouse on the Warehouse Connector in the Warehouse (MapR) Configuration Guide. And if you have created a directory named Ionsaw01 under /saw and provided the corresponding Local Mount Path as /saw/Ionsaw01. Then the corresponding HDFS root path is /Ionsaw01.
    The /saw mount point implies "/"as the root path for HDFS. The Warehouse Connector writes the data /Ionsaw01 in HDFS. If there is no data available in this path, the following error is displayed:
    “No data available. Check HDFS path”
    Make sure that /lonsaw01/rsasoc/v1/sessions/meta contains avro files of the meta data before performing test connection. If the /lonsaw01/rsasoc/v1/sessions/meta directory does not exist, a new folder will be created in the same location.
  10. Select Advanced checkbox, if you use the advanced settings or skip to Step 11.
    Perform the following:
    1. In the Database URL field, enter the complete JDBC url to connect to the HiveServer2.
       For example:
      If kerberos is enabled in hive then the JDBC url is: 
      jdbc:hive2://< host >:<port>/<db>;principal=<Kerberos serverprincipal>
      If SSL is enabled in hive then the JDBC url is: 
    2. In the Username and Password field, enter the JDBC credentials used to access HiveServer2.
  11. In the Host field, enter the IP address of the host on which HiveServer2 is hosted.

    Note: You can use the virtual IP address of Mapr only if HiveServer2 is running on all the nodes in the cluster.

  12. In the Port field, enter the HiveServer2 port of the Warehouse data source. By default, the port number is 10000.
  13. In the Username and Password field, enter the JDBC credentials used to access HiveServer2.

    Note: You can also use LDAP mode of authentication using Active Directory. To enable LDAP authentication mode, see Enable LDAP Authentication.

  14. Select Enable Jobs checkbox, if you want to enable the job settings or skip to Step 15.

    Perform the following steps:
    1. In the HDFS Type drop-down menu, select the type of HDFS. The possible values are MapR or Pivotal. 
    2. For a Pivotal HDFS type, do the following:
      1. In the HDFS Username field, enter the username that Reporting Engine should claim when connecting to Pivotal. For standard pivotal DCA clusters, this is ‘gpadmin’.
      2. In the HDFS Name field, enter the URL to access HDFS. For example,  hdfs://hdm1.gphd.local:8020.
      3. In the Input Path Prefix field, enter the output path of the Warehouse Connector (/sftp/rsasoc/v1/sessions/data/<year>/<month>/<date>/<hour>) until the year directory. For example, /sftp/rsasoc/v1/sessions/data/.
      4. In the Input Filename field, enter the file name filter for avro files. For example, sessions.
      5. In the Output Path Prefix field, enter the location where the data science job results are stored in HDFS.
      6. The ETL - Output Directory* displays the path where the ETL jobs are stored.
      7. In the Yarn Host Name field, enter the Hadoop yarn resource-manager host name in the DCA cluster. For example, hdm3.gphd.local
      8. In the Job History Server field, enter the Hadoop job-history-server address in the DCA cluster. For example, hdm3.gphd.local:10020
      9. In the Yarn Staging Directory field, enter the staging directory for YARN in the DCA cluster. For example, /user
      10. In the Socks Proxy field, if you are using the standard DCA cluster, most of the hadoop services will be running in a local private network, not reachable from Reporting Engine. Then, you must run a socks proxy in the DCA cluster and allow access from outside to the cluster.  For example, mdw.netwitness.local:1080
    3. For a MapR HDFS type, do the following:
      1. In the MapR Host Name field, the user can populate the public ip address of any one of the MapR warehouse hosts.
      2. In the MapR Host User field, enter a UNIX username in the given host that has access to execute map-reduce jobs on the cluster. Default value is 'mapr'.
      3. (Optional) In the MapR Host Password field, to setup password-less authentication, copy the public key of the “rsasoc” user from /home/rsasoc/.ssh/  to the “authorized_keys” file of the warehouse host located in /home/mapr/.ssh/authorized_keys, with the assumption that “mapr” is the remote UNIX user.
      4. In the MapR Host Work Dir field, enter a path that the given UNIX user (For example, “mapr” ) has write access to.

        Note: The work directory is used by Reporting Engine to remotely copy the Warehouse Analytics jar files and start the jobs from the given host name. You must not use “/tmp” to avoid filling up of the system temporary space. The given work directory will be remotely managed by Reporting Engine.

      5. In the HDFS Name field, enter the URL to access HDFS. For example, to access a specific cluster, maprfs:/mapr/<cluster-name>.
      6. In the Input Path Prefix field, enter the output path (/rsasoc/v1/sessions/data/<year>/<month>/<date>/<hour>) until the year directory. For example, /rsasoc/v1/sessions/data/.
      7. In the Input Filename field, enter the file name filter for avro files. For example, sessions-warehouseconnector.
      8. In the Output Path Prefix field, enter the location where the data science job results are stored in HDFS.
  15. Select Kerberos Authentication checkbox, if the Warehouse has Kerberos enabled Hive server.
    Perform the following steps:
    1. In the Server Principal field, enter the Prinicipal used by the hive server to authenticate with the Kerberos Key Distribution Center (KDC) Server.
    2. In the User Principal field, enter the Prinicipal that Hive JDBC client uses to authenticate with the KDC server for connecting the Hive server.

      Note: Reporting Engine supports only the data sources configured  with the same Kerberos credentials, like, User Principal and key tab file.

  16. Click Test Connection to test the connection.
  17. Click Save.
    The added Warehouse data source is displayed in the Reporting Engine Sources tab.
    If you want set the added Warehouse data source as default source for the Reporting Engine, select the added Warehouse data source and click set_default.png.


Security Analytics adds the Warehouse as a data source available to reports and alerts against this Reporting Engine.

You are here: Configure Reporting Engine > Step 3: Configure Reporting Engine Data Sources > Add Warehouse as a Data Source to Reporting Engine