000013721 - Time drift may cause NTP to not establish any peers in RSA Data Protection Manager

Document created by RSA Customer Support Employee on Jun 14, 2016Last modified by RSA Customer Support Employee on Apr 22, 2017
Version 2Show Document
  • View in full screen mode

Article Content

Article Number000013721
Applies ToRSA Data Protection Manager Appliance 3.x
IssueResolve time drift issues that may cause NTP to no longer establish peers on a DPM appliance
The time on the appliance will drift either forward or backward
NTP is unable to establish any peers
CauseTime drift issues are often caused by inaccuracies in the way that the system clock records time.
NTP may be unable to sync because the time drift on an appliance is too great.
ResolutionThe first step in troubleshooting time sync issues is determining if there is a time drift. This is usually present in the system clock but not the hardware clock. You can determine this by using the following:
# hwclock --show && date

Wed 03 Apr 2013 06:39:01 AM MDT  -0.795791 seconds

Wed Apr  3 06:34:26 MDT 2013

This will show you the time from the hardware clock followed by the time from the system or software clock. If there is a difference between the two and it is not consistent, you have a time drift issue. We can measure the time drift to get an idea of how significant it is by running ntpq and executing the peers command within it. You should get an output similar to the following:

ntpq> peers

remote refid st t when poll reach delay offset jitter
tik.cesnet.cz .GPS. 1 u 12 64 377 0.641 8494.05 2911.29
tak.cesnet.cz .GPS. 1 u 2 64 377 0.636 8594.86 2945.05

As you can see from the example above, there is a significant offset which indicates a large difference between the local system time and the time on the NTP servers being queried. The jitter value is also high which indicates that the difference in offset is not consistent.

These factors will often make NTP unable to establish a peer to sync time from. You can get a history of the offset using as and rv.

ntpq> as
ind assID status conf reach auth condition last_event cnt
1 55713 9014 yes yes none reject reachable 1
2 55714 9014 yes yes none reject reachable 1

ntpq> rv 55713
assID=55713 status=9014 reach, conf, 1 event, event_reach,
srcadr=tik.cesnet.cz, srcport=123, dstadr=, dstport=123,
leap=00, stratum=1, precision=-20, rootdelay=0.000,
rootdispersion=0.000, refid=GPS, reach=377, unreach=0, hmode=3, pmode=4,
hpoll=6, ppoll=6, flash=400 peer_dist, keyid=0, ttl=0, offset=13041.231,
delay=0.602, dispersion=0.944, jitter=2918.331,
reftime=cf803b51.ddd3e70e Mon, Apr 26 2010 18:18:25.866,
org=cf803b83.e9b29181 Mon, Apr 26 2010 18:19:15.912,
rec=cf803b76.df382c7c Mon, Apr 26 2010 18:19:02.871,
xmt=cf803b76.df0d40c7 Mon, Apr 26 2010 18:19:02.871,
filtdelay= 0.60 0.64 0.60 0.51 0.82 0.67 0.69 0.64,
filtoffset= 13041.2 12385.8 11720.4 11075.2 10409.6 9774.54 9129.22 8494.06,
filtdisp= 0.00 0.98 1.97 2.93 3.92 4.86 5.82 6.77

From the example above, you can see that the offset has been increasing over time. This makes NTP unable to establish a peer, even if you stop ntpd and use ntpdate to force the time to update.

Since the offset is increasing, we know that the kernel isn't counting off enough ticks when advancing the system clock. We can get the current number of ticks using tickadj:

# tickadj
tick = 10000

Based on this, you can start making small incremental changes to the tick in order to compensate for the drift. For example, we could set the tick value to 10100 to reduce the upward drift. Conversely, you could start with a value of 9900 if your drift is negative and increasing.

# tickadj 10100
tick = 10100

Once you have adjusted the tick value, stop ntpd, run ntpdate to sync the time manually, start ntpd, and use peers to measure the offset over a period of several minutes.

# service ntpd stop
# ntpdate my.time.server.com
# service ntpd start
# ntpq -p

You will likely need to adjust the tick value until you can keep the offset under 0.5. A positive offset means the ticks value needs to be adjusted higher. A negative value means you need to adjust it lower. There will always be some amount of offset, so close is good enough.

Once you have managed to stabilize the offset to stay at a low and consistent value, NTP should be able to establish a peer. When viewing the list of peers using ntpq -p, you will see an asterisk next to the chosen peer.

# ntpq -p

      remote           refid      st t when poll reach   delay   offset  jitter
  tik.cesnet.cz   .GPS.            1 u   12   64  377    0.641  0.05864 0.78158
*tak.cesnet.cz   .GPS.            1 u    2   64  377    0.636  0.04973 0.77895

Once you have successfully established a peer, you shouldn't see any more time sync issues.
NotesAdditional information on troubleshooting NTP can be found in this cached blog post from Google: http://webcache.googleusercontent.com/search?q=cache:lFo96MJ87mkJ:log.or.cz/%3Fp%3D80+&cd=1&hl=en&ct=clnk&gl=us
You can find information on tickadj from its man page: http://linux.die.net/man/8/tickadj
Legacy Article IDa61104