Step 3: Configure Reporting Engine Data Sources

Document created by RSA Information Design and Development on Jul 29, 2016
Version 1Show Document
  • View in full screen mode
 

This topic tells you how to:

  • Add a Data Source to a Reporting Engine
  • Set a Data Source as the Default Source

Add a Data Source to a Reporting Engine

Perform the following steps to associate a data source with a Reporting Engine:

  1. In the Security Analytics menu, select Dashboard > Administration > Services.
  2. In the Services Grid, select a Reporting Engine service.
  3. Click settings.png > View > Config.

    The Services Config View of Reporting Engine is displayed.

  4. In the Sources tab, click Icon-Add.png arrow.png > New Service.

    The New Service view is displayed, as shown below.

    WC_device_RE.png

  5. Perform the following steps:

    1. From the Source Type drop-down menu, select Warehouse.
    2. In the Warehouse Source drop-down menu, select the warehouse data source. 
    3. In the Name field, enter the name of the Warehouse data source.
    4. In the HDFS Path field, enter the HDFS root path to which the Warehouse Connector writes the data.
      For example:
      If /saw is the local mount point for HDFS that you have configured while mounting NFS on the device where you have installed the Warehouse Connector service to write to SAW, for more information, see Mount the Warehouse on the Warehouse Connector in the Warehouse (MapR) Configuration Guide. And if you have created a directory named Ionsaw01 under /saw and provided the corresponding Local Mount Path as /saw/Ionsaw01. Then the corresponding HDFS root path would be /Ionsaw0.

      The /saw mount point implies to as the root path for HDFS. The Warehouse Connector writes the data /Ionsaw01 in HDFS. If there is no data available in this path, the following error is displayed:
      “No data available. Check HDFS path”
      Make sure that /lonsaw01/rsasoc/v1/sessions/meta contains avro files of the meta data before performing test connection.
  6. Select Advanced checkbox, if you use the advanced settings or skip to Step 7.
  7. Perform the following steps:

    1. In the Database URL field, enter the complete JDBC url to connect to the HiveServer2.
      For example:
      If kerberos is enabled in hive then the JDBC url will be: 
      jdbc:hive2://<host>:<port>/<db>;principal=<Kerberos serverprincipal>

      If SSL is enabled in hive then the JDBC url will be: 
      jdbc:hive2://<host>:<port>/<db>;ssl=true;sslTrustStore=<trust_store_path>;trustStorePassword
      =<trust_store_password>

      For more information on HIVE server clients, see https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients.
    2. In the Username and Password field, enter the JDBC credentials used to access HiveServer2.
  8. In the Host field, enter the IP address of the host on which HiveServer2 is hosted.

    Note: You can use the virtual IP address of Mapr only if HiveServer2 is running on all the nodes in the cluster.

  9. In the Port field, enter the HiveServer2 port of the Warehouse data source. By default, the port number is 10000.
  10. In the Username and Password field, enter the JDBC credentials used to access HiveServer2.

    Note: You can also use LDAP mode of authentication using Active Directory. To enable LDAP authentication mode, see Enable LDAP Authentication.

  11. Select Enable Jobs checkbox, if you want to enable the job settings or skip to Step 12.

    Pivotal_Warehouse.png

    Perform the following steps:

    1. In the HDFS Type drop-down menu, select the type of HDFS. The possible values are MapR or Pivotal. 
    2. For a Pivotal HDFS type, do the following:
      1. In the HDFS Username field, enter the username that Reporting Engine should claim when connecting to Pivotal. For standard pivotal DCA clusters, this would be ‘gpadmin’.
      2. In the HDFS Name field, enter the URL to access HDFS. For example, hdfs://hdm1.gphd.local:8020.
      3. In the Input Path Prefix field, enter the output path of the Warehouse Connector (/sftp/rsasoc/v1/sessions/data/<year>/<month>/<date>/<hour>) until the year directory. For example, /sftp/rsasoc/v1/sessions/data/.
      4. In the Input Filename field, enter the file name filter for avro files. For example, sessions.
      5. In the Output Path Prefix field, enter the location where the data science job results are stored in HDFS.
      6. In the Yarn Host Name field, enter the Hadoop yarn resource-manager host name in the DCA cluster. For example, hdm3.gphd.local
      7. In the Job History Server field, enter the Hadoop job-history-server address in the DCA cluster. For example, hdm3.gphd.local:10020
      8. In the Yarn Staging Directory field, enter the staging directory for YARN in the DCA cluster. For example, /user
      9. In the Socks Proxy field, if you are using the standard DCA cluster, most of the hadoop services will be running in a local private network, not reachable from Reporting Engine. Then, you must run a socks proxy in the DCA cluster and allow access from outside to the cluster.  For example, mdw.netwitness.local:1080
    3. For a MapR HDFS type, do the following:
      1. In the MapR Host Name field, the user can populate the public ip address of any one of the MapR warehouse hosts.
      2. In the MapR Host User field, enter a UNIX username in the given host that has access to execute map-reduce jobs on the cluster. Default value is 'mapr'.
      3. (Optional) In the MapR Host Password field, to setup password-less authentication, copy the public key of the “rsasoc” user from /home/rsasoc/.ssh/id_rsa.pub  to the “authorized_keys” file of the warehouse host located in /home/mapr/.ssh/authorized_keys, with the assumption that “mapr” is the remote UNIX user.
      4. In the MapR Host Work Dir field, enter a path that the given UNIX user (For example, “mapr” ) has write access to.

        Note: The work directory is used by Reporting Engine to remotely copy the Warehouse Analytics jar files and start the jobs from the given host name. You must not use “/tmp” to avoid filling up of the system temporary space. The given work directory will be remotely managed by Reporting Engine.

      5. In the HDFS Name field, enter the URL to access HDFS. For example, to access a specific cluster, maprfs:/mapr/<cluster-name>.
      6. In the Input Path Prefix field, enter the output path (/rsasoc/v1/sessions/data/<year>/<month>/<date>/<hour>) until the year directory. For example, /rsasoc/v1/sessions/data/.
      7. In the Input Filename field, enter the file name filter for avro files. For example, sessions-warehouseconnector.
      8. In the Output Path Prefix field, enter the location where the data science job results are stored in HDFS.

        MapR_Warehouse.png

  12. Select Kerberos Authentication checkbox, if the Warehouse has Kerberos enabled Hive server.

    WC_device_RE_kerberos_option.png

    Perform the following steps:

    1. In the Server Principal field, enter the Principle used by the hive server to authenticate with the Kerberos Key Distribution Center (KDC) Server.
    2. In the User Principal field, enter the Principle that Hive JDBC client uses to authenticate with the KDC server for connecting the Hive server. For example, gpadmin@EXAMPLE.COM
    3. View the Kerberos keytab file location configured in the Hive Configuration panel on the Reporting Engine General tab.

      Note: Reporting Engine supports only the data sources configured  with the same Kerberos credentials, like, User Principal and key tab file.

    4. Click Test Connection to test the connection with the values entered.

    5. Click Save.

      The added Warehouse data source is displayed in the Reporting Engine Sources tab.

    6. Click Icon-Add.png arrow.png > Available Services.

      The Available Services dialog box is displayed, as shown below.

      available_dev_RE_104.png

  13. In the Available Services dialog box, do the following:    Select the service that you want to add as data source to the Reporting Engine and click OK.

    Security Analytics adds this as a data source available to reports and alerts against this Reporting Engine.

    available_dev_RE_104.png

    Note: This step is relevant only for an Untrusted model.

Set a Data Source as the Default Source

To set a data source to be the default source when you create reports and alerts:

  1. In the Security Analytics menu, select Dashboard > Administration > Services.
  2. In the Services Grid, select a Reporting Engine service.
  3. Select settings.png> View > Config.
    The Services Config View of Reporting Engine is displayed.
  4. Select the Sources tab.
    The Services Config View is displayed with the Reporting Engine Sources tab open.
  5. Select  the source that you want to be the default source (for example, Broker).
  6. Click the Set Default checkbox.
    Security Analytics defaults to this data source when you create reports and alerts against this Reporting Engine.

Next steps

You are here: Configure Reporting Engine > Step 3: Configure Reporting Engine Data Sources

Attachments

    Outcomes