Article Content
Article Number | 000026881 |
Applies To | RSA Security Analytics RSA Security Analytics Reporting Engine RSA Security Analytics Warehouse |
Issue | How can I find the Avro Schema for a particular meta value for my Hive Query. |
Resolution | Advanced Hive Queries for the SAW Warehouse are of the form: SET hive.mapred.supports.subdirectories=true; SELECT COUNT(alias_host[0]) AS TOTAL ,alias_host[0] AS DOMAIN FROM logs where alias_host IS NOT NULL AND device_type RLIKE 'squid' To write a query to extract that particular meta values you want you need to know the schema of that meta value in order that it can be entered in the field 'avro.schema.literal' = '...' One method of finding the schema is to: 1) Create a Query in The Warehouse by going to Reporting ->Manage ->Rule ->Create a New Warehouse Rule (This is shown in the picture below) 2) Do not chose expert mode at this point 3) Under select enter the meta key you are inerested in finding the schema for ( eg alias_host) (Remember that queries in the warehouse have dots in meta data replaced by _) 4) Choose sessions 5) Run the test query This will cause a text file to be written in the direcotory /tmp/mapr on the saw node that actually runs the job. This could be any of your saw nodes. 6) SSH to your saw nodes and in the type the following command grep alias_host /tmp/mapr |grep literal 7) On the node where the job ran you will be able to see the query that ran. It will look similar to the following Counters plan="{"queryId":"mapr_20140703141313_6b2c9fc2-79e0-4838-90da-d4e06bd711a2","queryType":null,"queryAttributes":{"queryString":"create external table reporting_sessionsba6d02a8_f9e2_42c8_9dc9_630c6716e827 partitioned by (partition_id int) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' WITH SERDEPROPERTIES ('avro.schema.literal'='{"name":"sessions","type":"record","fields":[{"default":null,"name":"time","type":["null","long"]},{"default":null,"name":"OS","type":["null","string"]},{"default":null,"name":"alias_host","type":["null",{"items":"string","type":"array"}]}]}') STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat' location '/lonsaw/rsasoc/v1/sessions'"},"queryCounters":"null","stageGraph":{"nodeType":"STAGE","roots":"null","adjacencyList":"]"},"stageList":[{"stageId":"Stage-0","stageType":"DDL","stageAttributes":"null","stageCounters":"}","taskList":[{"taskId":"Stage-0_OTHER","taskType":"OTHER","taskAttributes":"null","taskCounters":"null","operatorGraph":"null","operatorList":"]","done":"true","started":"true"}],"done":"true","started":"true"}],"done":"true","started":"true"}" TIME="1404396804719" ('avro.schema.literal'='{"name":"sessions","type":"record","fields":[{"default":null,"name":"time","type":["null","long"]},{"default":null,"name":"OS","type":["null","string"]},{"default":null,"name":"alias_host","type":["null",{"items":"string","type":"array"}]}]}')
If you are unsure of any of the steps above or experience any issues, contact RSA Support and quote this article ID for further assistance. |
Notes | To find the correct schema values to use in your own hive query, refer to the screenshot below. |
Legacy Article ID | a66738 |