Reporting Engine: Step 3. Configure Data Sources

Document created by RSA Information Design and Development on Jul 26, 2016Last modified by RSA Information Design and Development on Jul 26, 2016
Version 4Show Document
  • View in full screen mode
 

This topic tells you how to:

  • Add a Data Source to a Reporting Engine
  • Set a Data Source as the Default Source

Add a Data Source to a Reporting Engine

This section contains the following procedures:

  • Basic Setup
  • Enable Jobs
  • Enable Kerberos Authentication

Basic Setup

To associate a data source with a Reporting Engine:

  1. In the Security Analytics menu, select Administration Services.
  2. In the Services Grid, select a Reporting Engine service.
  3. Click   > View > Config.

    The Services Config View of Reporting Engine is displayed.

  4. On the Sources tab, click  > New Service.

    The New Servicedialog is displayed.

  5. Fill in the fields as follows:

    1. In the Source Type drop-down menu, select Warehouse.
    2. In the Warehouse Source drop-down menu, select the warehouse data source. 
    3. In the Name field, enter the name of the Warehouse data source.

      Note: Make sure you do not use special characters such as &,' , ", < and > while adding the data source. If you use special characters in the name field, the update to the Reporting Engine fails.

    4. In the HDFS Path field, enter the HDFS root path to which the Warehouse Connector writes the data.

      For example:
      If /saw is the local mount point for HDFS that you have configured while mounting NFS on the device where you have installed the Warehouse Connector service to write to SAW, for more information, see Mount the Warehouse on the Warehouse Connector in the RSA Analytics Warehouse (MapR) Configuration Guide.

      If you have created a directory named Ionsaw01 under /saw and provided the corresponding Local Mount Path as /saw/Ionsaw01, then the corresponding HDFS root path would be /Ionsaw01.

      The /saw mount point implies to as the root path for HDFS. The Warehouse Connector writes the data /Ionsaw01 in HDFS. If there is no data available in this path, the following error is displayed:

      “No data available. Check HDFS path”

      Make sure that /lonsaw01/rsasoc/v1/sessions/meta contains avro files of the meta data before performing test connection.

    5. Select Advanced checkbox to use the advanced settings, and fill in the Database URL with the complete JDBC URL to connect to the HiveServer2.

      For example:
      If kerberos is enabled in hive then the JDBC url will be:

      jdbc:hive2://<host>:<port>/<db>;principal=<Kerberos serverprincipal>

      If SSL is enabled in hive then the JDBC url will be:

      jdbc:hive2://<host>:<port>/<db>;ssl=true;sslTrustStore=<trust_store_path>;trustStorePassword=<trust_store_password>

      For more information on HIVE server clients, see https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients.

    6. If not using the advanced settings, enter the values for the Host and Port.

      • In the Host field, enter the IP address of the host on which HiveServer2 is hosted.

        Note: You can use the virtual IP address of Mapr only if HiveServer2 is running on all the nodes in the cluster.

      • In the Port field, enter the HiveServer2 port of the Warehouse data source. By default, the port number is 10000.
    7. In the Username and Passwordfield, enter the JDBC credentials used to access HiveServer2.

      Note: You can also use LDAP mode of authentication using Active Directory. For instructions to enable LDAP authentication mode, see Enable LDAP Authentication.

Continue to the next section, Enable Jobs, if you want to run warehouse analytics reports. If you do not want to run warehouse analytics reports, skip to Enable Kerberos Authentication.

Enable Jobs

To run warehouse analytics reports, perform this procedure.

  1. Select the Enable Jobs checkbox.

  2. Fill in the fields as follows:

    1. Select the type of HDFS from the HDFS Type drop-down menu.

      • If you select the Pivotal HDFS type, enter the following information:

                                                           
        FieldDescription

        HDFS Username

        Enter the username that Reporting Engine should claim when connecting to Pivotal. For standard pivotal DCA clusters, this would be ‘gpadmin’.
        HDFS NameEnter the URL to access HDFS. For example, hdfs://hdm1.gphd.local:8020.

        HBase Zookeeper Quorom

        Enter the list of host names separated by a comma on which the ZooKeeper servers are running.
        HBase Zookeeper PortEnter the port number for the ZooKeeper servers. The default port is 2181.

        Input Path Prefix

        Enter the output path of the Warehouse Connector (/sftp/rsasoc/v1/sessions/data/<year>/<month>/<date>/<hour>) until the year directory.

        For example, /sftp/rsasoc/v1/sessions/data/.

        Output Path PrefixEnter the location where the data science job results are stored in HDFS.

        Yarn Host Name

        Enter the Hadoop yarn resource-manager host name in the DCA cluster.

        For example, hdm3.gphd.local.

        Job History Server

        Enter the Hadoop job-history-server address in the DCA cluster.

        For example, hdm3.gphd.local:10020.

        Yarn Staging Directory

        Enter the staging directory for YARN in the DCA cluster.

        For example, /user.

        Socks Proxy

        If you are using the standard DCA cluster, most of the hadoop services will be running in a local private network, not reachable from Reporting Engine. Then, you must run a socks proxy in the DCA cluster and allow access from outside to the cluster.

        For example, mdw.netwitness.local:1080.

      • If you select the MapR HDFS type, enter the following information:

                                                       
        FieldDescription
        MapR Host Name

        The user can populate the public ip address of any one of the MapR warehouse hosts.

        MapR Host UserEnter a UNIX username in the given host that has access to execute map-reduce jobs on the cluster. The default value is 'mapr'.
        MapR Host Password(Optional)To setup password-less authentication, copy the public key of the “rsasoc” user from /home/rsasoc/.ssh/id_rsa.pub to the “authorized_keys” file of the warehouse host located in /home/mapr/.ssh/authorized_keys, with the assumption that “mapr” is the remote UNIX user.
        MapR Host Work Dir

        Enter a path that the given UNIX user (for example, “mapr” ) has write access to.

        Note: The work directory is used by Reporting Engine to remotely copy the Warehouse Analytics jar files and start the jobs from the given host name. You must not use “/tmp” to avoid filling up of the system temporary space. The given work directory will be remotely managed by Reporting Engine.

        HDFS NameEnter the URL to access HDFS. For example, to access a specific cluster, maprfs:/mapr/<cluster-name>.
        HBase Zookeeper PortEnter the port number for the ZooKeeper servers. The default port is 5181.

        Input Path Prefix

        Enter the output path (/rsasoc/v1/sessions/data/<year>/<month>/<date>/<hour>) until the year directory.

        For example, /rsasoc/v1/sessions/data/.

        Input Filenameenter the file name filter for avro files. For example, sessions-warehouseconnector.
        Output Path PrefixEnter the location where the data science job results are stored in HDFS.
    2. Select the MapReduce Framework as per the HDFS type.

      Note: For HDFS type MapR, select MapReduce framework as Classic. For HDFS type Pivotal, select MapReduce Framework as Yarn.

Next, enable Kerberos authentication.

Enable Kerberos Authentication

  1. Select Kerberos Authentication checkbox, if the Warehouse has Kerberos enabled Hive server.

  2. Fill in the fields as follows:

                           
    FieldDescription

    Server Principal

    Enter the Principle used by the hive server to authenticate with the Kerberos Key Distribution Center (KDC) Server.

    User PrincipalEnter the Principle that Hive JDBC client uses to authenticate with the KDC server for connecting the Hive server. For example, gpadmin@EXAMPLE.COM.

    Kerberos Keytab File

    View the Kerberos keytab file location configured in the Hive Configuration panel on the Reporting Engine: General Tab.

    Note: Reporting Engine supports only the data sources configured with the same Kerberos credentials, like, User Principal and key tab file.

  3. Click Test Connection to test the connection with the values entered.
  4. Click Save.

    The added Warehouse data source is displayed in the Reporting Engine Sources tab.

  5. Click  Available Services.

    The Available Services dialog box is displayed.

  6. In the Available Services dialog box, select the service that you want to add as data source to the Reporting Engine and click OK.

    Security Analytics adds this as a data source available to reports and alerts against this Reporting Engine.

    Note: This step is relevant only for an Untrusted model.

Set a Data Source as the Default Source

To set a data source to be the default source when you create reports and alerts:

  1. In the Security Analytics menu, select Dashboard > Administration > Services.
  2. In the Services Grid, select a Reporting Engine service.
  3. Select > View > Config.

    The Services Config View of Reporting Engine is displayed.

  4. Select the Sources tab.

    The Services Config View is displayed with the Reporting Engine Sources tab open.

  5. Select the source that you want to be the default source (for example, Broker).
  6. Click the Set Default checkbox.

    Security Analytics defaults to this data source when you create reports and alerts against this Reporting Engine.

Next steps 

You are here: Configure Reporting Engine > Reporting Engine: Step 3. Configure Data Sources

Attachments

    Outcomes