000027164 - How to configure SNMP to check replication status

Document created by RSA Customer Support Employee on Jun 14, 2016Last modified by RSA Customer Support Employee on Apr 21, 2017
Version 2Show Document
  • View in full screen mode

Article Content

Article Number000027164
Applies ToAuthentication Manager 7.1 and Appliance 3.0
 
IssueHow to configure SNMP to check replication status
SNMP Messages for Replication Monitoring
Resolution

The RSA AM7.1 (and Appliance 3.0)  SNMP agent is able to send traps to an SNMP server when various events happen. The latest update can also send status information about replication,  this is described in the readme for 7.1.2 hotfix rollup 1  AM-13982 (the fix is also included in SP4, although the description is not).  The detailed description of AM-13982 is below:


 


AM-13982  SNMP Messages for Replication Monitoring


 


          Once this Hotfix is applied, if the deployment contains any replicas the


          status of the replication system will be checked every 30 minutes. The


          results will be logged and generate SNMP traps, subject to the usual SNMP


          and log level settings in the Security Console. In the descriptions below,


          note that the only the fields named: action, reason, result and arguments


          actually appear in the SNMP trap and log message. The title and message


          are localized English descriptions. This same check can be performed


          interactively in the Operations Console at Deployment Configuration >


          Instances > Status Report.


 


          This process involves two types of checks. The first check ensures that


          the system performing the check has sufficient archive log space. The


          results of this check are described as follows.


 


          Action:


 


          title = "Check replication archive log usage",


          message = "System checked replication archive log usage and found {used} MB


          used out of {allocated} MB allocated"


          arguments = used, allocated


          action = REPLICATION_ARCHIVE_LOG_USAGE_CHECK


 


          Reasons:


 


          title = "Replication archive log disk usage for this site is acceptable"


          reason = REPLICATION_ARCHIVE_LOG_USAGE_LOW


          result = SUCCESS


 


          title = "Replication archive log disk usage for this site is high"


          reason = REPLICATION_ARCHIVE_LOG_USAGE_HIGH


          result = WARN


 


          title = "Replication archive logs for this site are full"


          reason = REPLICATION_ARCHIVE_LOG_USAGE_FULL


          result = FAIL


 


          The second check ensures that the links between the system performing the


          check and each other systems are working properly. This check occurs twice


          per link (once in each direction). The results of this check are described


          as follows.


 


          Action:


 


          title = "Check replication status",


          message = "System checked replication status between the primary {primary}


          and the replica {replica} in the direction {direction}"


          arguments = primary, replica, direction


          action = REPLICATION_LINK_STATUS


 


          Reasons:


          title = "Replication is complete"


          reason = REPLICATION_LINK_STATUS_COMPLETE


          result = SUCCESS


 


          title = "Replication is in progress"


          reason = REPLICATION_LINK_STATUS_IN_PROGRESS


          result = SUCCESS


 


          title = "Replication is blocked"


          reason = REPLICATION_LINK_STATUS_BLOCKED


          result = FAIL


 


          title = "Replication is broken"


          reason = REPLICATION_LINK_STATUS_BROKEN


          result = FAIL


 


          title = "Network connection is broken"


 


          reason = REPLICATION_LINK_STATUS_UNREACHABLE


          result = FAIL


 


          title = "Replication status is unknown"


          reason = REPLICATION_LINK_STATUS_UNKNOWN


          result = FAIL


 


 


 


 


 I have attached two packet captures of the system replication links for comparison,   link_status_complete.pcap and link_status_unreachable.pcap. Partial information is below:


.


Link_status_complete.pcap:


 


variable-bindings: 6 items


1.  SNMPv2-MIB::sysUpTime.0 (1.3.6.1.2.1.1.3.0): 10876


2.  SNMPv2-MIB::snmpTrapOID.0 (1.3.6.1.6.3.1.1.4.1.0): 1.3.6.1.4.1.2197.20.17 (SNMPv2-SMI::enterprises.2197.20.17)


3.  SNMPv2-SMI::enterprises.2197.20.16.5.0 (1.3.6.1.4.1.2197.20.16.5.0): 494E464F


4.  SNMPv2-SMI::enterprises.2197.20.16.7.0 (1.3.6.1.4.1.2197.20.16.7.0): 3136323539


5.  SNMPv2-SMI::enterprises.2197.20.16.6.0 (1.3.6.1.4.1.2197.20.16.6.0): 53797374656D206576656E74207B49443A20343831303539...


(embedded text in item 5)


System event {ID: 4810594e5f63650a01686e670039b7c7, time: Tue Jan 19 14:30:16 EST 2010, client: null, user: null, action: REPLICATION_LINK_STATUS, action id: 16259, result: SUCCESS, reason: REPLICATION_LINK_STATUS_COMPLETE, arguments: [w2k3am71-95.jm-vm.com, w2k3am71-96.jm-vm.com, primary to replica]}


6. SNMPv2-SMI::enterprises.2197.20.16.8.0 (1.3.6.1.4.1.2197.20.16.8.0): 5245504C49434154494F4E5F4C494E4B5F5354415455535F... 


 


 


 


Link_status_unreachable.pcap


variable-bindings: 6 items


1. SNMPv2-MIB::sysUpTime.0 (1.3.6.1.2.1.1.3.0): 35657


2. SNMPv2-MIB::snmpTrapOID.0 (1.3.6.1.6.3.1.1.4.1.0): 1.3.6.1.4.1.2197.20.17


3. SNMPv2-SMI::enterprises.2197.20.16.5.0 (1.3.6.1.4.1.2197.20.16.5.0): 4552524F52


4. SNMPv2-SMI::enterprises.2197.20.16.7.0 (1.3.6.1.4.1.2197.20.16.7.0): 3136323539


5. SNMPv2-SMI::enterprises.2197.20.16.6.0 (1.3.6.1.4.1.2197.20.16.6.0): 53797374656D206576656E74207B49443A20343831343231...


(embedded text in item 5)


System event {ID: 481421435f63650a02051f553522996a, time: Tue Jan 19 14:34:23 EST 2010, client: null, user: null, action: REPLICATION_LINK_STATUS, action id: 16259, result: FAIL, reason: REPLICATION_LINK_STATUS_UNREACHABLE, arguments: [w2k3am71-95.jm-vm.com, w2k3am71-96.jm-vm.com, primary to replica]}


6. SNMPv2-SMI::enterprises.2197.20.16.8.0 (1.3.6.1.4.1.2197.20.16.8.0): 5245504C49434154494F4E5F4C494E4B5F5354415455535F...


(embedded text in #6)


REPLICATION_LINK_STATUS_UNREACHABLE


 


The SNMP trap ID for both events (actually all events in AM7.1)  is 1.3.6.1.4.1.2197.20.17  , with additional items ending in .16.5.0 , 16.7.0 , 16.6.0, 16.8.0 .  If you only look at the OID numbers they may seem the same, but if you examine the embedded text (particularly of item #5, expanded in the examples) it shows various details, including what was checked,   the result (success, warn or failure), the reason  for the result, and other details.  Individual OIDs  aren?t available for each event, you need to configure your SNMP server to parse and examine the embedded text.  For example, you may be interested in your SNMP server triggering an alert when there are some number of ?result: FAIL?  results.  


 


Search your installation  media for files named *.asn1 , these are the AM7.1 MIBs. Common locations are:


\auth_mgr\(os)\am\mibs\AM.asn1


\auth_mgr\(os)\ims\ims.kit\mibs\IMS.asn1


The IMS.asn1 MIB has the trap event for  (17) which is the only trap event, the rest of the information in the MIBs is related to doing an SNMP GET .


 


The RSA Appliance 3.x  model 250  has an additional MIB  10892.mib , that is specific to the Dell hardware used for this appliance, and a manual is available for this. This is configured through the Operations console of the appliance, and is unrelated to the AM7.1 SNMP settings done through the Security Console.  While the Appliance 130 has these configuration settings available in the OC , the appliance 130 hardware doesn?t actually have this capability, and this should not be configured on an Appliance 130. This is also not applicable for AM7.1 installed on a non-appliance server.

Legacy Article IDa61946

Attachments

    Outcomes