000034086 - Error "No space left on device" on Warehouse Nodes in RSA Security Analytics

Document created by RSA Customer Support Employee on Nov 19, 2016Last modified by RSA Customer Support Employee on Apr 21, 2017
Version 2Show Document
  • View in full screen mode

Article Content

Article Number000034086
Applies ToRSA Product Set: NetWitness Logs and Packets, Security Analytics
RSA Product/Service Type: MapR Warehouse (SAW)
Platform: CentOS
O/S Version: EL6
IssueUnable to run the reports through warehouse DB as the source in the Reporting Engine displays a "No value returned" message.
Example error from the /opt/rsa/saw/logs/saw.log file:
java.io.IOException: Error: 28:No space left on device(28), file: job_201506020742_3595677 Cluster is full

This can be confirmed by issuing the command below on all the nodes to see if any disk in cluster is full.
# /opt/mapr/server/mrconfig sp list
ResolutionLog in to one of the SAW nodes and run the following procedure:
  1. To check the replication set in saw node using the command below.
    maprcli volume info -name mapr.cluster.root -json

  2. First we enable Disk Balancer that will remove data from filled disks to non-filled disks.
    maprcli volume modify -name mapr.cluster.root -replication 2
    maprcli config save -values ' { "cldb.balancer.disk.paused":"0" } '

    This will make replication from 3 to 2 will remove one copy of file from cluster hence reducing the disk usage by 33%.
  3. Then we set the thresholds properly with the commands below.
    maprcli config save -values ' { "cldb.balancer.disk.sleep.interval.sec":"120" } '
    maprcli config save -values ' { "cldb.balancer.startup.interval.sec":"600" } '
    maprcli config save -values ' { "cldb.balancer.disk.threshold.percentage":"50"  } '

  4. Once these commands are executed, you need to wait for 5 Minutes and run the command below.
    maprcli dump balancerinfo

The output of above command will have 2 Columns with name similar to In-Transit and Out-Transit those numbers are the amount of data being moved from one disk to another.
NotesWait for 1 Hour to make sure all disks are balanced before running another job.

Attachments

    Outcomes