000026881 - How to find the Avro Schema for a particular meta value for my Expert Hive Query in RSA Security Analytics

Document created by RSA Customer Support Employee on Jun 14, 2016Last modified by RSA Customer Support Employee on Apr 21, 2017
Version 3Show Document
  • View in full screen mode

Article Content

Article Number000026881
Applies ToRSA Security Analytics
RSA Security Analytics Reporting Engine
RSA Security Analytics Warehouse
IssueHow can I find the Avro Schema for a particular meta value for my Hive Query.

Advanced Hive Queries for the SAW Warehouse are of the form:

         SET hive.mapred.supports.subdirectories=true;
         SET mapred.input.dir.recursive=true;
         DROP TABLE IF EXISTS logs;
         CREATE EXTERNAL TABLE logs ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
         STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
         OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
         LOCATION '/lonsaw/rsasoc/v1/sessions/data/2014/6/30/'
         TBLPROPERTIES  ('avro.schema.literal'='{"name":"sessions","type":"record","fields":[{"default":null,"name":"time","type":["null","long"]},{"default":null,"name":"OS","type":["null","string"]},{"default":null,"name":"device_type","type":["null","string"]},{"default":null,"name":"alias_host","type":["null",{"items":"string","type":"array"}]}]}');

         SELECT COUNT(alias_host[0]) AS TOTAL ,alias_host[0] AS DOMAIN FROM logs where alias_host IS NOT NULL AND device_type RLIKE 'squid'
         GROUP BY alias_host[0] ORDER BY TOTAL DESC;

To write a query to extract that particular meta values you want you need to know the schema of that meta value in order that it can be entered in the field 'avro.schema.literal' = '...'

One method of finding the schema is to:

1) Create a Query in The Warehouse by going to Reporting ->Manage ->Rule ->Create a New Warehouse Rule (This is shown in the picture below)

2) Do not chose expert mode at this point

3) Under select enter the meta key you are inerested in finding the schema for ( eg alias_host) (Remember that queries in the warehouse have dots in meta data replaced by _)

4) Choose sessions

5) Run the test query This will cause a text file to be written in the direcotory /tmp/mapr on the saw node that actually runs the job. This could be any of your saw nodes.

6) SSH to your saw nodes and in the type the following command grep alias_host /tmp/mapr |grep literal

7) On the node where the job ran you will be able to see the query that ran. It will look similar to the following

Counters plan="{"queryId":"mapr_20140703141313_6b2c9fc2-79e0-4838-90da-d4e06bd711a2","queryType":null,"queryAttributes":{"queryString":"create external table reporting_sessionsba6d02a8_f9e2_42c8_9dc9_630c6716e827 partitioned by (partition_id int) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ('avro.schema.literal'='{"name":"sessions","type":"record","fields":[{"default":null,"name":"time","type":["null","long"]},{"default":null,"name":"OS","type":["null","string"]},{"default":null,"name":"alias_host","type":["null",{"items":"string","type":"array"}]}]}') STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' location '/lonsaw/rsasoc/v1/sessions'"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":[{"stageId":"Stage-0","stageType":"DDL","stageAttributes":"null","stageCounters":"}","taskList":[{"taskId":"Stage-0_OTHER","taskType":"OTHER","taskAttributes":"null","taskCounters":"null","operatorGraph":"null","operatorList":"]","done":"true","started":"true"}],"done":"true","started":"true"}],"done":"true","started":"true"}" TIME="1404396804719"

From this query you can use values in



If you are unsure of any of the steps above or experience any issues, contact RSA Support and quote this article ID for further assistance.


To find the correct schema values to use in your own hive query, refer to the screenshot below.


Legacy Article IDa66738