Appliance fails to join because of slow network switch negotiation
2 years ago
Originally Published: 2012-05-28
Article Number
000046226
Applies To
RSA Data Protection Manager Appliance 3.1.2
Issue
Appliance fails to join because of slow network switch negotiation
DPM Appliance fails to join another appliance, and the following error shows in log file /opt/appliance/logs/rkma-system.log during join operation:

2012-02-28 11:08:54,323 ERROR - com.rsa.appliance.sys.service.impl.NewSetupApplianceServiceImpl.processErrorsIfAny(NewSetupApplianceServiceImpl.java:462) : Exception occurred: Error on copy from remote box:Copying file /version.txt from 10.10.17.55...
Copy ... Failed.False
Remote server is unreachable.
2012-02-28 11:08:54,323 ERROR - com.rsa.appliance.sys.service.impl.NewSetupApplianceServiceImpl.processErrorsIfAny(NewSetupApplianceServiceImpl.java:483) : Could not connect to the provided remote IP 10.10.17.55 as QUSER.
2012-02-28 11:08:54,324 ERROR - com.rsa.appliance.sys.service.impl.NewSetupApplianceServiceImpl.processErrorsIfAny(NewSetupApplianceServiceImpl.java:486) : Exception occurred: Error while trying to connect to the remote host:Copying file /version.txt from 10.10.17.55...
Copy ... Failed.False
Remote server is unreachable.
2012-02-28 11:08:54,325 ERROR - error.setup.software.configuration.failed
com.rsa.appliance.exception.BusinessServiceException
 at com.rsa.appliance.sys.service.impl.NewSetupApplianceServiceImpl.processErrorsIfAny(NewSetupApplianceServiceImpl.java:493)
 at com.rsa.appliance.sys.service.impl.NewSetupApplianceServiceImpl.validateHostAndPwdAndCopyCertificates(NewSetupApplianceServiceImpl.java:405)
 at com.rsa.appliance.sys.service.impl.NewSetupApplianceServiceImpl.validateClusterJoinReadiness(NewSetupApplianceServiceImpl.java:334)
 at com.rsa.appliance.sys.service.impl.NewSetupApplianceServiceImpl.setupAppliance(NewSetupApplianceServiceImpl.java:162)
 at com.rsa.appliance.sys.scheduler.QuickSetupJob.executeJob(QuickSetupJob.java:66)
 at com.rsa.appliance.sys.taskmanagement.BaseJob.execute(BaseJob.java:167)
 at org.quartz.core.JobRunShell.run(JobRunShell.java:202)
 at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:534)

Resolution
This issue has been fixed in the next release DPM Appliance 3.2 (not released as of writing this article).
As a workaround, for DPM Appliance 3.1.2, update the script /opt/rsa/setup/sh/copy_functions.sh (as shown in red in the following excerpt) to add "ping -c 30" [this will generate 30 ping requests ensuring that the switch receives enough packets to generate its routing table] BEFORE joining operation:

function copyDummyFileFromRemoteServer()
{
username=quser
password=$2
remoteServer=$1
file=/version.txt
copy_dir=/opt/rsa/setup/work
        mkdir -p /opt/rsa/setup/work
        rm -f /opt/rsa/setup/work/version.txt
        rm -f /root/.ssh/known_hosts
        ### KMA-2623 ###
        echo "Pinging host $remoteServer for 30 counts to allow slow switch port negotiation to occur"
        ping -c30 $remoteServer
        ################
        echo "Copying file $file from $remoteServer... "
        COPY_STATUS=`python /opt/rsa/setup/py/GetFileFromRemoteServerNew.py $username $password $remoteServer $file $copy_dir`
        retval=$?
        if [ $retval != 0 ]
        then
                echo "Copy ... Failed.$COPY_STATUS"
                return $retval
        else
                if [ ! -f /opt/rsa/setup/work/version.txt ]; then
                        echo "Could not copy the file"
                        return 1
                fi
                echo "Copy Done"
                return 0
        fi
}