ntpd both running and not running?

Started by danb35, July 25, 2020, 05:51:53 PM

Previous topic - Next topic
I recently installed OPNsense (currently on 20.1.9), and it's working well, though with a few oddities.  One of them is that, at least from what the web UI shows, the ntpd service won't start.  The dashboard shows:
(see attachment)
...and in what's probably the relevant section, the log shows:
(see other attachment)

And indeed, there is a process listening on *:123, but it's ntpdate, not ntpd:

root@opnsense:~ # sockstat -l | grep :123
root     ntpdate    77157 4  udp6   *:123                 *:*
root     ntpdate    77157 5  udp4   *:123                 *:*
root@opnsense:~ # service ntpdate status
Cannot 'status' ntpdate. Set ntpdate_enable to YES in /etc/rc.conf or use 'onestatus' instead of 'status'.
root@opnsense:~ # service ntpdate onestatus
ntpdate is not running.

Something's strange--what should the next steps be to track this down?

July 25, 2020, 06:53:21 PM #1 Last Edit: July 25, 2020, 08:05:26 PM by danb35
A little more searching has found https://forum.opnsense.org/index.php?topic=9980.0 in German, which seems to be about the same problem.  Like in that thread, I find that the PID of the ntpdate process is constantly changing.  I likewise have nothing relevant in /etc/rc.conf, and don't have rc.conf.local at all.

Edit:  When I run ps aux, I see this in the list:

root    15508   0.0  0.1 1058416  3136  -  Ss   12:58      0:00.01 /bin/sh /usr/local/etc/inc/plugins.inc.d/ntpd/ntpdate_sync_once.


I'm having a little trouble parsing that script, but from what I can tell it's supposed to try three times to sync and then give up if it failed.  But the system uptime is a couple of days, it's still coming up in the "ps aux" listing, and new ntpdate processes keep getting started.

Interestingly, there's nothing at all in the ntpd log since some time yesterday, even though I'd tried to start the service again since then.


Hi there,

ntpdate_sync_once.sh is an historic (and arcane) bit of software that defers NTP daemon start and sets the right time through ntpdate before starting NTP daemon itself. Some of this stems from the fact that NTP daemon may only slowly update time or fail to do so if drift is too wide.

Problem is it looks "broken" once you startup the device or after applying interface settings etc., but in the end ntpdate finishes and ntpd starts properly without further need to handle it.


Cheers,
Franco

Quote
in the end ntpdate finishes and ntpd starts properly without further need to handle it.
So how long does this take?  Because this system is now at a week of uptime, and repeated ntpdate processes are still spawning, blocking ntpd from starting.

It should take 30 seconds. If it takes longer it indicates possible connectivity problems with ntpdate?


Cheers,
Franco

Quote from: franco on July 29, 2020, 01:29:16 PM
If it takes longer it indicates possible connectivity problems with ntpdate?
Don't know about that:

root@opnsense:~ # ntpdate -q pool.ntp.org
server 64.225.34.103, stratum 2, offset 0.279882, delay 0.10892
server 162.159.200.123, stratum 3, offset 0.290670, delay 0.04358
server 209.50.63.74, stratum 4, offset 0.290167, delay 0.10580
server 138.68.46.177, stratum 2, offset 0.288142, delay 0.10635
29 Jul 07:30:11 ntpdate[23722]: adjust time server 138.68.46.177 offset 0.288142 sec

But still, ntpdate processes keep starting:

root@opnsense:~ # sockstat -l | grep :123
root     ntpdate    35153 4  udp6   *:123                 *:*
root     ntpdate    35153 5  udp4   *:123                 *:*
root@opnsense:~ # sockstat -l | grep :123
root@opnsense:~ # sockstat -l | grep :123
root     ntpdate    14746 4  udp6   *:123                 *:*
root     ntpdate    14746 5  udp4   *:123                 *:*
root@opnsense:~ # sockstat -l | grep :123
root     ntpdate    31092 4  udp6   *:123                 *:*
root     ntpdate    31092 5  udp4   *:123                 *:*

I could, of course, try to reboot the system--but that sounds like an awfully Windows-y troubleshooting step.

Something in this loop isn't quite right then:

https://github.com/opnsense/core/blob/master/src/etc/inc/plugins.inc.d/ntpd/ntpdate_sync_once.sh#L13

Would be interested in the log:

# clog /var/log/ntpd.log | grep attempts


Cheers,
Franco


root@opnsense:~ # clog /var/log/ntpd.log | grep attempts
Jul 21 16:11:25 opnsense ntp: Giving up on time sync after 3 attempts.
Jul 21 17:28:14 opnsense ntp: Successfully synced time after 1 attempts.
Jul 21 17:36:33 opnsense ntp: Successfully synced time after 1 attempts.
Jul 21 17:38:57 opnsense ntp: Successfully synced time after 1 attempts.
Jul 21 17:41:13 opnsense ntp: Successfully synced time after 1 attempts.
Jul 21 18:50:34 opnsense ntp: Successfully synced time after 1 attempts.
Jul 21 18:52:43 opnsense ntp: Successfully synced time after 1 attempts.


It's probably noteworthy, though, that there's nothing at all in ntpd.log since 24 July:

Jul 23 10:38:29 opnsense ntpdate[33552]: 2001:470:88f6:: rate limit response from server.
Jul 23 10:38:39 opnsense ntpdate[33552]: 2001:19f0:5:daf::2 rate limit response from server.
Jul 23 13:19:38 opnsense ntpdate[89457]: 12.167.151.1 rate limit response from server.
Jul 23 14:36:34 opnsense ntpdate[56908]: 64.225.34.103 rate limit response from server.
Jul 24 01:08:58 opnsense ntpdate[71733]: 142.147.92.5 rate limit response from server.
Jul 24 04:17:43 opnsense ntpdate[73287]: 206.55.191.142 rate limit response from server.
Jul 24 05:09:01 opnsense ntpdate[24693]: 64.142.54.12 rate limit response from server.
Jul 24 06:50:46 opnsense ntpdate[47278]: 192.111.144.114 rate limit response from server.
Jul 24 06:50:56 opnsense ntpdate[6556]: 192.111.144.114 rate limit response from server.
CLOG}???

Ok, let's see what's wrong...

# sh -x /usr/local/etc/inc/plugins.inc.d/ntpd/ntpdate_sync_once.sh

The script isn't handling ntpdate being stuck very well. Might be enough to run the following to confirm:

# killall ntpdate


Cheers,
Franco

July 29, 2020, 01:48:43 PM #10 Last Edit: July 29, 2020, 01:50:18 PM by danb35

root@opnsense:~ # sh -x /usr/local/etc/inc/plugins.inc.d/ntpd/ntpdate_sync_once.sh
+ NOTSYNCED=true
+ MAX_ATTEMPTS=3
+ /bin/cat /conf/config.xml
+ /usr/bin/grep timeservers
+ /usr/bin/cut '-d>' -f2
+ /usr/bin/cut '-d<' -f1
+ SERVER='0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'
+ [ '0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org' '=' '' ]
+ /bin/pkill -f ntpdate_sync_once.sh
+ ATTEMPT=1
+ [ true '=' true ]
+ [ 1 -le 3 ]
+ [ true ]
+ /usr/bin/killall ntpdate
+ /bin/pgrep ntpd
+ [ 1 -eq 0 ]
+ break
+ sleep 1
Terminated


Edit:

root@opnsense:~ # killall ntpdate
root@opnsense:~ # sh -x /usr/local/etc/inc/plugins.inc.d/ntpd/ntpdate_sync_once.sh
+ NOTSYNCED=true
+ MAX_ATTEMPTS=3
+ /bin/cat /conf/config.xml
+ /usr/bin/cut '-d>' -f2
+ /usr/bin/cut '-d<' -f1
+ /usr/bin/grep timeservers
+ SERVER='0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'
+ [ '0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org' '=' '' ]
+ /bin/pkill -f ntpdate_sync_once.sh
+ ATTEMPT=1
+ [ true '=' true ]
+ [ 1 -le 3 ]
+ [ true ]
+ /usr/bin/killall ntpdate
+ /bin/pgrep ntpd
+ [ 1 -eq 0 ]
+ break
+ sleep 1
+ /usr/local/sbin/ntpdate -s -t 5 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org
Terminated
root@opnsense:~ #


PS: I hate this forum software!  Why does it return you to the forum index once you save a post?

Do you see segmentation faults in "dmesg" output?


root@opnsense:~ # dmesg | grep -i fault
root@opnsense:~ #

grep "core dumped" please, looks like it doesn't use "fault" in the message, e.g.

pid 63091 (syslog-ng), jid 0, uid 0: exited on signal 11 (core dumped)


Cheers,
Franco


root@opnsense:~ # dmesg | grep -i "core dumped"
root@opnsense:~ #