Warehouse: Step 6. Configure the Destination Using WebHDFS

Document created by RSA Information Design and Development on Nov 23, 2016
Version 1Show Document
  • View in full screen mode
  

This topic describes the tasks to configure the Warehouse Connector service to write the collected data to a hadoop-based distributed computing system that supports WebHDFS. 

Prerequisites

Make sure that you have:

  • Installed the Warehouse Connector service or virtual appliance in your network environment.
  • Added the Warehouse Connector service to Security Analytics. For more information, see the Add a Service to a Host topic in the Hosts and Services Getting Started Guide.
  • Added the hostname (or FQDN) and IP address of the warehouse nodes and Warehouse Connector to the DNS server.  If the DNS server is not configured the add the hostname (or FQDN) and IP address of the warehouse nodes and Warehouse Connector to the file in the host on which the Warehouse Connector service is installed.
  • If you want Kerberos authentication between the warehouse connector and the warehouse cluster, make sure that you perform the following:
    • Kerberos Key Distribution Center (KDC) Server is configured in your network environment and the Kerberos Keytab file is copied to the host on which you have installed Warehouse Connector.<kerberos authentication>
    • Kerberos authentication is enabled in the warehouse cluster. For instructions, see the Warehouse documentation.
  • If you want to enable checksum validation to validate the integrity of the AVRO files that are transferred from the Warehouse Connector to the destinations, make sure that you generate the keys without setting the passphrase and do a key exchange between warehouse connector and the warehouse nodes. You need to configure SSH key-based access between the Warehouse Connector and the Warehouse host or hadoop node. For more information, see Configure SSH Keys below.

Configure Warehouse Connector to Write to a Remote Destination

To configure the destination:

  1. Log on to Security Analytics.
  2. In the Security Analytics menu, select Administration > Services.
  3. In the Services view, select the added Warehouse Connector service and Actions menu cropped > View > Config.
    The Services Config view of Warehouse Connector is displayed.
    SrvConfig.png
  4. On the Sources and Destinations tab, in the Destination Configuration section, click .
  5. In the Add Destination dialog, select WebHDFS from the drop-down list.
    AddWHDFSDest.png
  6. In the Name field, enter a unique symbolic name for the destination.

    Note: The Name field does not support space or special characters except underscore (_).

  7. In the Hadoop IP field, enter the namenode IP address of the warehouse cluster.
  8. In the Hadoop Port field, enter the base port that is used by the namenode web user interface.
  9. In the Username field, enter the owner of the directory in the warehouse to which Warehouse Connector should write the data.
  10. In the Hadoop Path field, enter the path of the directory in the warehouse to which Warehouse Connector should write the data.
  11. Select the Kerberos Authentication checkbox, if you want the warehouse connector to securely communicate with the warehouse using Kerberos authentication.
    AddWHDFSDestKAuth.png
    Perform the following:
    1. In the Kerberos Principal field, enter the KDC Principal used for Kerberos authentication.
    2. In the Kerberos Keytab File Path field, enter the path of the Kerberos Keytab file in the Warehouse Connector.
  12. Click Save.
  13. (Optional) If you want to enable checksum validation, perform the following:
    1. In the Security Analytics menu, select Administration > Services.
    2. In the Services view, select the added Warehouse Connector service and Actions menu cropped > View > Explore.
      The Explore view of Warehouse Connector is displayed.
      WCexpVw.png
    3. In the options panel, navigate to warehouseconnector/destinations/webhdfs/config.
    4. Set the parameter isChecksumValidationRequired to 1.
      WCExpVwEx.png
    5. Restart the respective stream.

Configure SSH Keys

Follow these steps to configure SSH key-based access between the Warehouse Connector and the Warehouse host or hadoop node.

To configure SSH keys:

  1. Generate SSH keys on the Warehouse Connector at the default location. Perform the following:

    1. Log on to the Warehouse Connector.
    2. Type the following command and press ENTER:

      $ ssh-keygen -t dsa
    3. The command prompts you to enter the file in which to save the generated key.

      Enter file in which to save the key (/root/.ssh/id_dsa):
    4. Enter the file in which you want to save the key and press ENTER.

      The command prompts you to enter and confirm the passphrase.

      Note: Make sure that you do not set the passphrase.

       Enter passphrase (empty for no passphrase):
      Enter same passphrase again:

      The public key is generated and is saved in the location that you provided.

  2. Append the generated public key to the remote Warehouse host or hadoop node's authorized keys list located at: ~/.ssh/authorized_keys

    Note: Make sure that you copy the public keys to the hadoop node and while copying the public key ensure that you provide the login details of the user using which the WebHDFS destination would be added.

Result

You can now securely communicate between Warehouse Connector and Warehouse nodes or hadoop nodes.

You are here
Table of Contents > Configure Warehouse Connector > Step 6. Configure the Destination Using WebHDFS

Attachments

    Outcomes