This topic provides an overview of the Warehouse Connector. Warehouse Connector allows you to collect meta and events from Decoder and Log Decoder and write them in AVRO format into a Hadoop-based distributed computing system.
You can set up Warehouse Connector as a service on your existing Log Decoder or Decoder hosts or it can be run as a virtual appliance in your virtual environment. The Warehouse Connector is available as an RPM package file and also as an Open Virtual Appliance (OVA) file.
- Use the RPM package file to set up the Warehouse Connector as a service on your existing Log Decoder or Decoder hosts, see Install Warehouse Connector Service on a Log Decoder or Decoder.
- Use the OVA file to set up the Warehouse Connector virtual appliance, see Virtual Host Setup Guide.
Components of Warehouse Connector
The Warehouse Connector contains the following components:
- Data Source
- Data Stream
A data source is the service from which the Warehouse Connector collects data to store in the destination. The supported data sources are Log Decoder and Decoder services. The Log Decoder collects log events and the Decoder collects packet and meta exclusively.
Destination is the Hadoop-based distributed computing system that collects, manages, and enables analytics and reporting on security data. The following are the supported destinations:
- RSA Analytics Warehouse (MapR) deployments
- RSA Analytics Warehouse (Pivotal) deployments
- Any Hadoop-based distributed computing system that supports WebHDFS or NFS mounting of HDFS file systems.
- Example: Commercial MapR M5 Enterprise Edition for Apache Hadoop
A data stream is a logical connection between the data source and destination. You can have multiple streams for different subsets of data collected. You can setup streams to segregate data from multiple Decoder and Log Decoder services. You can create a stream with multiple data sources and a single destination or with a single data source and destination.
Features of Warehouse Connector
The following are the features provided by Warehouse Connector:
- Aggregates session and raw log data from Decoders and Log Decoders.
- Transfers the aggregated data into supported destinations like Hadoop based deployments.
- Serializes the aggregated data that includes both schema and data into AVRO format.
Meta filters in Warehouse Connector enable you to filter the metas that should be written into the Warehouse. For more information, see Specify Meta Filters.
Support for Multi-Valued Meta
RSA Analytics Warehouse supports multi-valued meta. The multi-valued meta is the meta field with the array type. You can use the meta library to determine the meta fields of type array and write Hive queries with the correct syntax for arrays. By default, the following metas are treated as multi-valued and are defined in the file, multivalue-bootstrap.xml located at /etc/netwitness/ng in the Warehouse Connector:
You can also define an existing meta or a custom meta to be treated as multi-valued meta by performing the following:
- Create a new file with the filename multivalue-users.xml in the /etc/netwitness/ng directory.
- Add the following entries:
Where NEWMETANAME is the existing meta or a custom meta to be treated as multi-valued meta.
- Reload the Warehouse Connector Stream. For more information, see Services Config View - Warehouse Connector.
Warehouse Connector enables you to validate the file integrity of the AVRO files that are transferred from the Warehouse Connector to the data destinations. You need to enable checksum validation while you configure the Warehouse Connector.
Lockbox provides an encrypted file that Warehouse Connector uses to store and protect sensitive data. You need to create the lockbox by providing a lockbox password while configuring the Warehouse Connector for the first time.