000035268 - When running hadoop fs commands on MaPR SAW receiving createMapRFileStatus & readdirplus errors in RSA NetWitness 10.3.x

Document created by RSA Customer Support Employee on Jun 21, 2017
Version 1Show Document
  • View in full screen mode

Article Content

Article Number000035268
Applies ToRSA Product Set: Security Analytics
RSA Product/Service Type: Security Analytics Warehouse (SAW)
RSA Version/Condition: 10.3.x
Platform: CentOS
Platform (Other): MapR Hadoop v2.0.1
O/S Version: EL6
IssueWhen performing any of the following hadoop fs commands errors are produced:
# hadoop fs -ls
# hadoop fs -du
# hadoop fs -dus

The last entry of 'hadoop fs -ls' is the staging directory followed by a lot of readdirplus and createMapRFileStatus errors:
drwx------   - mapr mapr     910386 2017-05-19 08:00 /var/mapr/cluster/mapred/jobTracker/staging/mapr/.staging
2017-05-19 08:33:05,7567 ERROR JniCommon fs/client/fileclient/cc/jni_common.cc:1826 Thread: 139741497308928 readdirplus failed, could not create MapRFileStatus object for /var/mapr/cluster/mapred/jobTracker/staging/mapr/.staging/job_2
2017-05-19 08:33:20,6629 ERROR JniCommon fs/client/fileclient/cc/jni_common.cc:695 Thread: 139741497308928 createMapRFileStatus failed, could not get gidstr for gid 2147483632
2017-05-19 08:33:41,0532 ERROR JniCommon fs/client/fileclient/cc/jni_common.cc:1826 Thread: 139741497308928 readdirplus failed, could not create MapRFileStatus object for /var/mapr/cluster/mapred/jobTracker/staging/mapr/.staging/job_2
2017-05-19 08:33:54,2379 ERROR JniCommon fs/client/fileclient/cc/jni_common.cc:1826 Thread: 139741497308928 readdirplus failed, could not create MapRFileStatus object for /var/mapr/cluster/mapred/jobTracker/staging/mapr/.staging/job_2
2017-05-19 08:34:52,1719 ERROR JniCommon fs/client/fileclient/cc/jni_common.cc:706 Thread: 139741497308928 createMapRFileStatus failed, could not get comrpessionTypeStr  for compressionType 0

 
CauseIssue with the directory /var/mapr/cluster/mapred/jobTracker/staging/mapr/.staging
 
Resolution
1. Stop jobtracker service
# maprcli node services -jobtracker stop -nodes SAWNODE1 SAWNODE2 SAWNODE3

If you receive an error similar to:
ERROR (10008) -  Input for nodes: [SAWNODE3] does not match the IP address or hostname of any cluster nodes.

You can first try the following command:
# hadoop job -unblacklist-tracker SAWNODE3

You can then SSH to Node 3 and run the commands for that single node:
# hadoop job -unblacklist-tracker SAWNODE3
# service mapr-warden restart
# maprcli node services -jobtracker stop -nodes SAWNODE3

2. Stop tasktracker service
# maprcli node services -tasktracker stop -nodes SAWNODE1 SAWNODE2 SAWNODE3

3. Check services have stopped
Run the following command and check you can't see tasktracker (usually first) and jobtracker
# maprcli node list -columns ip,id,service,health,Disk,healthDesc

4. Recreate directory and set the correct permissions
# hadoop fs -rmr -skipTrash /var/mapr/cluster/mapred/jobTracker/staging/mapr/.staging
# hadoop fs -mkdir /var/mapr/cluster/mapred/jobTracker/staging/mapr/.staging
# hadoop fs -chown -R mapr:mapr /var

You could optionally set the owner of the rest of the common volumes to ensure they are correct:
# hadoop fs -chown -R mapr:mapr /hbase
# hadoop fs -chown -R mapr:mapr /index-scratch
# hadoop fs -chown -R mapr:mapr /jars
# hadoop fs -chown -R mapr:mapr /logs
# hadoop fs -chown -R mapr:mapr /saw

5. Run gsfck on the appropriate volumes
# /opt/mapr/bin/gfsck rwvolume=mapr.jobtracker.volume -d -r
# /opt/mapr/bin/gfsck rwvolume=mapr.cluster.root -d -r

You can optionally also run this on the remaining common volumes:
# /opt/mapr/bin/gfsck rwvolume=mapr.cldb.internal -d -r
# /opt/mapr/bin/gfsck rwvolume=mapr.configuration -d -r
# /opt/mapr/bin/gfsck rwvolume=mapr.hbase -d -r
# /opt/mapr/bin/gfsck rwvolume=mapr.metrics -d -r
# /opt/mapr/bin/gfsck rwvolume=mapr.var -d -r

6. Restart Services
# maprcli node services -jobtracker start -nodes SAWNODE1 SAWNODE2 SAWNODE3
# maprcli node services -tasktracker start -nodes SAWNODE1 SAWNODE2 SAWNODE3

7. Check Services
Run the following command and you should see tasktracker running on all nodes and jobtracker running on at least one node:
# maprcli node list -columns ip,id,service,health,Disk,healthDesc

8. Reboot the SAW cluster (start with CLDB master followed by ZooKeeper leader and any other nodes). This is to ensure each node's volumes are not in read-only mode (there is the potential to trigger a run of createsystemvolumes.sh)
 
Notes

The following logs may provide further insights:


/opt/mapr/logs/createsystemvolumes.log
/opt/mapr/logs/mfs.log

Attachments

    Outcomes