Reporting Engine: Add Warehouse as a Data Source

Document created by RSA Information Design and Development on Nov 23, 2016Last modified by RSA Information Design and Development on Feb 6, 2017
Version 4Show Document
  • View in full screen mode
  

This topic provides instructions on how to:

  • Add a Warehouse Data Source to Reporting Engine
  • Set Warehouse Data Source as the Default Source

Note: Horton Works Data Source is supported in Security Analytics version 10.6.2.1 or later.

Prerequisites

Make sure that:

  • Hive server is in running state on all the Warehouse nodes. You can use the following command to check the status of the hive server:
    status hive2 (MapR deployments)
    service hive-server2 status (Pivotal HD deployments)
  • Warehouse Connector is configured to write data to the warehouse deployments.
  • If Kerberos authentication is enabled for HiveServer2, make sure that the keytab file is copied to the /home/rsasoc/rsa/soc/reporting-engine/conf/ directory in the Reporting Engine Host.

    Note: Make sure that the rsasoc user role has read permissions to read the keytab file.

    Also, make sure that you update the keytab file location in the Kerberos Keytab File parameter in the Reporting Engine Service Config View as shown below.

  • The default Kerberos configuration file is located at, /etc/kbr5.conf in the Reporting Engine. You can modify the configuration file to provide details for Kerberos realms and other parameters related to Kerberos.
  • Added the host name (or FQDN) and IP address of the Pivotal nodes and Warehouse Connector to the DNS server.  If the DNS server is not configured the add the host name (or FQDN) and IP address of the Pivotal nodes and Warehouse Connector to the /etc/hosts file in the host on which the Warehouse Connector service is installed.

Prerequisites for Horton Works Data Source (HDP)

Note: Make sure you download the hive-jdbc-1.2.1-with-full-dependencies.jar. This jar contains the driver file of Hive 1.2.1 which connects to Reporting Engine for Hive 1.2.1 Hiveserver2, from RSA Link (https://community.rsa.com/docs/DOC-67251).

  1. SSH to the Security Analytics server.
  2. In the /opt/rsa/soc/reporting-engine/plugins/ folder, take a backup of the following jar:
    hive-jdbc-0.12.0-with-full-dependencies.jar or hive-jdbc-1.0.0-mapr-1508-standalone.jar
  3. Remove the following jar:
    hive-jdbc-0.12.0-with-full-dependencies.jar or hive-jdbc-1.0.0-mapr-1508-standalone.jar
  4. In the /opt/rsa/soc/reporting-engine/plugins folder, copy the following jar using WinSCP:
    hive-jdbc-1.2.1-with-full-dependencies.jar
  5. Restart the Reporting Engine service.
  6. Log in to Security Analytics UI.
  7. Select the Reporting Engine service and select > View > Explore.
  8. In the hiveConfig, set EnableSmallSplitBasedSchemaLiteralCreation parmater to true.

Procedure

Perform the following steps to associate a Warehouse data source with Reporting Engine:

  1. In the Security Analytics menu, select Administration > Services.
  2. In the Services grid, select the Reporting Engine service.
  3. Click > View > Config.
  4. Click the Sources tab.

    The Service Config view is displayed with the Reporting Engine Sources tab open.

  5. Click and select New Service.

    The New Service dialog is displayed.

    For HDP

  6. In the Source Type drop-down menu, select Warehouse.
  7. In the Warehouse Source drop-down menu, select the warehouse data source. 
  8. In the Name field, enter the name of the Warehouse data source.
  9. In the HDFS Path field, enter the HDFS root path to which the Warehouse Connector writes the data.

    For example:
    If /saw is the local mount point for HDFS that you have configured while mounting NFS on the device where you have installed the Warehouse Connector service to write to SAW, for more information, see Mount the Warehouse on the Warehouse Connector in the RSA Analytics Warehouse (MapR) Configuration Guide.

    If you have created a directory named Ionsaw01 under /saw and provided the corresponding Local Mount Path as /saw/Ionsaw01, then the corresponding HDFS root path would be /Ionsaw01.

    The /saw mount point implies to as the root path for HDFS. The Warehouse Connector writes the data /Ionsaw01 in HDFS. If there is no data available in this path, the following error is displayed:

    “No data available. Check HDFS path”

    Make sure that /lonsaw01/rsasoc/v1/sessions/meta contains avro files of the meta data before performing test connection.

  10. Select Advanced checkbox to use the advanced settings, and fill in the Database URL with the complete JDBC URL to connect to the HiveServer2.

    For example:
    If kerberos is enabled in hive then the JDBC url will be:

    jdbc:hive2://<host>:<port>/<db>;principal=<Kerberos serverprincipal>

    If SSL is enabled in hive then the JDBC url will be:

    jdbc:hive2://<host>:<port>/<db>;ssl=true;sslTrustStore=<trust_store_path>;trustStorePassword=<trust_store_password>

    For more information on HIVE server clients, see https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients.

  11. If not using the advanced settings, enter the values for the Host and Port.

    • In the Host field, enter the IP address of the host on which HiveServer2 is hosted.

      Note: You can use the virtual IP address of Mapr only if HiveServer2 is running on all the nodes in the cluster.

    • In the Port field, enter the HiveServer2 port of the Warehouse data source. By default, the port number is 10000.
  12. In the Username and Passwordfield, enter the JDBC credentials used to access HiveServer2.

    Note: You can also use LDAP mode of authentication using Active Directory. For instructions to enable LDAP authentication mode, see Enable LDAP Authentication.

  13. To run warehouse analytics reports, see Enable Jobs in Step 3. Configure Reporting Engine Data Sources.
  14. Enable Kerberos authentication: see Enable Kerberos Authentication in Step 3. Configure Reporting Engine Data Sources.
  15. If you want set the added Warehouse data source as default source for the Reporting Engine, select the added Warehouse data source and click  .

Procedure for HDP

  1. SSH to the HDP system.
  2. Navigate to /usr/hdp/current/hive-server2/conf/hive-site.xml or /etc/hive/conf/hive-site.xml.
  1. Append the following parameter in the hive-site.xml:
    <property>
    <name>hive.security.authorization.sqlstd.confwhitelist</name><value>hive\.auto\..*|hive\.cbo\..*|hive\.convert\..*|hive\.exec\.dynamic\.partition.*|hive\.exec\..*\.dynamic\.partitions\..*|hive\.exec\.compress\..*|hive\.exec\.infer\..*|hive\.exec\.mode.local\..*|hive\.exec\.orc\..*|hive\.exec\.parallel.*|hive\.explain\..*|hive\.fetch.task\..*|hive\.groupby\..*|hive\.hbase\..*|hive\.index\..*|hive\.index\..*|hive\.intermediate\..*|hive\.join\..*|hive\.limit\..*|hive\.log\..*|hive\.mapjoin\..*|hive\.merge\..*|hive\.optimize\..*|hive\.orc\..*|hive\.outerjoin\..*|hive\.parquet\..*|hive\.ppd\..*|hive\.prewarm\..*|hive\.server2\.proxy\.user|hive\.skewjoin\..*|hive\.smbjoin\..*|hive\.stats\..*|hive\.tez\..*|hive\.vectorized\..*|mapred\.map\..*|mapred\.reduce\..*|mapred\.output\.compression\.codec|mapred\.job\.queuename|mapred\.output\.compression\.type|mapred\.min\.split\.size|mapreduce\.job\.reduce\.slowstart\.completedmaps|mapreduce\.job\.queuename|mapreduce\.job\.tags|mapreduce\.input\.fileinputformat\.split\.minsize|mapreduce\.map\..*|mapreduce\.reduce\..*|mapreduce\.output\.fileoutputformat\.compress\.codec|mapreduce\.output\.fileoutputformat\.compress\.type|tez\.am\..*|tez\.task\..*|tez\.runtime\..*|tez.queue.name|hive\.exec\.reducers\.bytes\.per\.reducer|hive\.client\.stats\.counters|hive\.exec\.default\.partition\.name|hive\.exec\.drop\.ignorenonexistent|hive\.counters\.group\.name|hive\.default\.fileformat\.managed|hive\.enforce\.bucketing|hive\.enforce\.bucketmapjoin|hive\.enforce\.sorting|hive\.enforce\.sortmergebucketmapjoin|hive\.cache\.expr\.evaluation|hive\.hashtable\.loadfactor|hive\.hashtable\.initialCapacity|hive\.ignore\.mapjoin\.hint|hive\.limit\.row\.max\.size|hive\.mapred\.mode|hive\.map\.aggr|hive\.compute\.query\.using\.stats|hive\.exec\.rowoffset|hive\.variable\.substitute|hive\.variable\.substitute\.depth|hive\.autogen\.columnalias\.prefix\.includefuncname|hive\.autogen\.columnalias\.prefix\.label|hive\.exec\.check\.crossproducts|hive\.compat|hive\.exec\.concatenate\.check\.index|hive\.display\.partition\.cols\.separately|hive\.error\.on\.empty\.partition|hive\.execution\.engine|hive\.exim\.uri\.scheme\.whitelist|hive\.file\.max\.footer|hive\.mapred\.supports\.subdirectories|hive\.insert\.into\.multilevel\.dirs|hive\.localize\.resource\.num\.wait\.attempts|hive\.multi\.insert\.move\.tasks\.share\.dependencies|hive\.support\.quoted\.identifiers|hive\.resultset\.use\.unique\.column\.names|hive\.analyze\.stmt\.collect\.partlevel\.stats|hive\.server2\.logging\.operation\.level|hive\.support\.sql11\.reserved\.keywords|hive\.exec\.job\.debug\.capture\.stacktraces|hive\.exec\.job\.debug\.timeout|hive\.exec\.max\.created\.files|hive\.exec\.reducers\.max|hive\.reorder\.nway\.joins|hive\.output\.file\.extension|hive\.exec\.show\.job\.failure\.debug\.info|hive\.exec\.tasklog\.debug\.timeout|mapred\.input\.dir\.recursive|hive\.mapred\.supports\.subdirectories|mapred\.max\.split\.size</value>
    </property>
  2. Save the hive-site.xml file.
  3. Make sure that the HIVE server is listening on port 10000, execute the following commands:
    1. netstat –ntpl | grep 10000
    2. Kill the process listening on port 10000
    3. su - hive
    4. hive --service hiveserver2 &
    5. netstat -anp | grep 10000

Result

Security Analytics adds the Warehouse as a data source available to reports and alerts against this Reporting Engine.

You are here
Table of Contents > Configure Reporting Engine > Step 3. Configure Reporting Engine Data Sources > Add Warehouse as a Data Source to Reporting Engine

Attachments

    Outcomes