Print Page - ntpd both running and not running?

Title: ntpd both running and not running?
Post by: danb35 on July 25, 2020, 05:51:53 PM

I recently installed OPNsense (currently on 20.1.9), and it's working well, though with a few oddities. One of them is that, at least from what the web UI shows, the ntpd service won't start. The dashboard shows:
(see attachment)
...and in what's probably the relevant section, the log shows:
(see other attachment)

And indeed, there is a process listening on *:123, but it's ntpdate, not ntpd:

Code Select


root@opnsense:~ # sockstat -l | grep :123
root     ntpdate    77157 4  udp6   *:123                 *:*
root     ntpdate    77157 5  udp4   *:123                 *:*
root@opnsense:~ # service ntpdate status
Cannot 'status' ntpdate. Set ntpdate_enable to YES in /etc/rc.conf or use 'onestatus' instead of 'status'.
root@opnsense:~ # service ntpdate onestatus
ntpdate is not running.

Something's strange--what should the next steps be to track this down?

Title: Re: ntpd both running and not running?
Post by: danb35 on July 25, 2020, 06:53:21 PM

A little more searching has found https://forum.opnsense.org/index.php?topic=9980.0 in German, which seems to be about the same problem. Like in that thread, I find that the PID of the ntpdate process is constantly changing. I likewise have nothing relevant in /etc/rc.conf, and don't have rc.conf.local at all.

Edit: When I run ps aux, I see this in the list:

Code Select


root    15508   0.0  0.1 1058416  3136  -  Ss   12:58      0:00.01 /bin/sh /usr/local/etc/inc/plugins.inc.d/ntpd/ntpdate_sync_once.

I'm having a little trouble parsing that script, but from what I can tell it's supposed to try three times to sync and then give up if it failed. But the system uptime is a couple of days, it's still coming up in the "ps aux" listing, and new ntpdate processes keep getting started.

Interestingly, there's nothing at all in the ntpd log since some time yesterday, even though I'd tried to start the service again since then.

Title: Re: ntpd both running and not running?
Post by: danb35 on July 29, 2020, 02:43:35 AM

Anyone?

Title: Re: ntpd both running and not running?
Post by: franco on July 29, 2020, 01:20:18 PM

Hi there,

ntpdate_sync_once.sh is an historic (and arcane) bit of software that defers NTP daemon start and sets the right time through ntpdate before starting NTP daemon itself. Some of this stems from the fact that NTP daemon may only slowly update time or fail to do so if drift is too wide.

Problem is it looks "broken" once you startup the device or after applying interface settings etc., but in the end ntpdate finishes and ntpd starts properly without further need to handle it.

Cheers,
Franco

Title: Re: ntpd both running and not running?
Post by: danb35 on July 29, 2020, 01:28:02 PM

Quote
in the end ntpdate finishes and ntpd starts properly without further need to handle it.

So how long does this take? Because this system is now at a week of uptime, and repeated ntpdate processes are still spawning, blocking ntpd from starting.

Title: Re: ntpd both running and not running?
Post by: franco on July 29, 2020, 01:29:16 PM

It should take 30 seconds. If it takes longer it indicates possible connectivity problems with ntpdate?

Cheers,
Franco

Title: Re: ntpd both running and not running?
Post by: danb35 on July 29, 2020, 01:33:45 PM

Quote from: franco on July 29, 2020, 01:29:16 PM
If it takes longer it indicates possible connectivity problems with ntpdate?

Don't know about that:

Code Select


root@opnsense:~ # ntpdate -q pool.ntp.org
server 64.225.34.103, stratum 2, offset 0.279882, delay 0.10892
server 162.159.200.123, stratum 3, offset 0.290670, delay 0.04358
server 209.50.63.74, stratum 4, offset 0.290167, delay 0.10580
server 138.68.46.177, stratum 2, offset 0.288142, delay 0.10635
29 Jul 07:30:11 ntpdate[23722]: adjust time server 138.68.46.177 offset 0.288142 sec

But still, ntpdate processes keep starting:

Code Select


root@opnsense:~ # sockstat -l | grep :123
root     ntpdate    35153 4  udp6   *:123                 *:*
root     ntpdate    35153 5  udp4   *:123                 *:*
root@opnsense:~ # sockstat -l | grep :123
root@opnsense:~ # sockstat -l | grep :123
root     ntpdate    14746 4  udp6   *:123                 *:*
root     ntpdate    14746 5  udp4   *:123                 *:*
root@opnsense:~ # sockstat -l | grep :123
root     ntpdate    31092 4  udp6   *:123                 *:*
root     ntpdate    31092 5  udp4   *:123                 *:*

I could, of course, try to reboot the system--but that sounds like an awfully Windows-y troubleshooting step.

Title: Re: ntpd both running and not running?
Post by: franco on July 29, 2020, 01:36:56 PM

Something in this loop isn't quite right then:

https://github.com/opnsense/core/blob/master/src/etc/inc/plugins.inc.d/ntpd/ntpdate_sync_once.sh#L13

Would be interested in the log:

# clog /var/log/ntpd.log | grep attempts

Cheers,
Franco

Title: Re: ntpd both running and not running?
Post by: danb35 on July 29, 2020, 01:39:34 PM

Code Select


root@opnsense:~ # clog /var/log/ntpd.log | grep attempts
Jul 21 16:11:25 opnsense ntp: Giving up on time sync after 3 attempts. 
Jul 21 17:28:14 opnsense ntp: Successfully synced time after 1 attempts. 
Jul 21 17:36:33 opnsense ntp: Successfully synced time after 1 attempts. 
Jul 21 17:38:57 opnsense ntp: Successfully synced time after 1 attempts. 
Jul 21 17:41:13 opnsense ntp: Successfully synced time after 1 attempts. 
Jul 21 18:50:34 opnsense ntp: Successfully synced time after 1 attempts. 
Jul 21 18:52:43 opnsense ntp: Successfully synced time after 1 attempts.

It's probably noteworthy, though, that there's nothing at all in ntpd.log since 24 July:

Code Select


Jul 23 10:38:29 opnsense ntpdate[33552]: 2001:470:88f6:: rate limit response from server. 
Jul 23 10:38:39 opnsense ntpdate[33552]: 2001:19f0:5:daf::2 rate limit response from server. 
Jul 23 13:19:38 opnsense ntpdate[89457]: 12.167.151.1 rate limit response from server. 
Jul 23 14:36:34 opnsense ntpdate[56908]: 64.225.34.103 rate limit response from server. 
Jul 24 01:08:58 opnsense ntpdate[71733]: 142.147.92.5 rate limit response from server. 
Jul 24 04:17:43 opnsense ntpdate[73287]: 206.55.191.142 rate limit response from server. 
Jul 24 05:09:01 opnsense ntpdate[24693]: 64.142.54.12 rate limit response from server. 
Jul 24 06:50:46 opnsense ntpdate[47278]: 192.111.144.114 rate limit response from server. 
Jul 24 06:50:56 opnsense ntpdate[6556]: 192.111.144.114 rate limit response from server. 
CLOG}???

Title: Re: ntpd both running and not running?
Post by: franco on July 29, 2020, 01:43:35 PM

Ok, let's see what's wrong...

# sh -x /usr/local/etc/inc/plugins.inc.d/ntpd/ntpdate_sync_once.sh

The script isn't handling ntpdate being stuck very well. Might be enough to run the following to confirm:

# killall ntpdate

Cheers,
Franco

Title: Re: ntpd both running and not running?
Post by: danb35 on July 29, 2020, 01:48:43 PM

Code Select


root@opnsense:~ # sh -x /usr/local/etc/inc/plugins.inc.d/ntpd/ntpdate_sync_once.sh
+ NOTSYNCED=true
+ MAX_ATTEMPTS=3
+ /bin/cat /conf/config.xml
+ /usr/bin/grep timeservers
+ /usr/bin/cut '-d>' -f2
+ /usr/bin/cut '-d<' -f1
+ SERVER='0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'
+ [ '0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org' '=' '' ]
+ /bin/pkill -f ntpdate_sync_once.sh
+ ATTEMPT=1
+ [ true '=' true ]
+ [ 1 -le 3 ]
+ [ true ]
+ /usr/bin/killall ntpdate
+ /bin/pgrep ntpd
+ [ 1 -eq 0 ]
+ break
+ sleep 1
Terminated

Edit:

Code Select


root@opnsense:~ # killall ntpdate
root@opnsense:~ # sh -x /usr/local/etc/inc/plugins.inc.d/ntpd/ntpdate_sync_once.sh
+ NOTSYNCED=true
+ MAX_ATTEMPTS=3
+ /bin/cat /conf/config.xml
+ /usr/bin/cut '-d>' -f2
+ /usr/bin/cut '-d<' -f1
+ /usr/bin/grep timeservers
+ SERVER='0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'
+ [ '0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org' '=' '' ]
+ /bin/pkill -f ntpdate_sync_once.sh
+ ATTEMPT=1
+ [ true '=' true ]
+ [ 1 -le 3 ]
+ [ true ]
+ /usr/bin/killall ntpdate
+ /bin/pgrep ntpd
+ [ 1 -eq 0 ]
+ break
+ sleep 1
+ /usr/local/sbin/ntpdate -s -t 5 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org
Terminated
root@opnsense:~ #

PS: I hate this forum software! Why does it return you to the forum index once you save a post?

Title: Re: ntpd both running and not running?
Post by: franco on July 29, 2020, 02:24:19 PM

Do you see segmentation faults in "dmesg" output?

Title: Re: ntpd both running and not running?
Post by: danb35 on July 29, 2020, 03:33:04 PM

Code Select


root@opnsense:~ # dmesg | grep -i fault
root@opnsense:~ #

Title: Re: ntpd both running and not running?
Post by: franco on July 29, 2020, 03:45:42 PM

grep "core dumped" please, looks like it doesn't use "fault" in the message, e.g.

pid 63091 (syslog-ng), jid 0, uid 0: exited on signal 11 (core dumped)

Cheers,
Franco

Title: Re: ntpd both running and not running?
Post by: danb35 on July 29, 2020, 03:47:15 PM

Code Select


root@opnsense:~ # dmesg | grep -i "core dumped"
root@opnsense:~ #

Title: Re: ntpd both running and not running?
Post by: danb35 on July 31, 2020, 06:58:16 PM

Quote from: danb35 on July 29, 2020, 01:33:45 PM
I could, of course, try to reboot the system--but that sounds like an awfully Windows-y troubleshooting step.

Well, not having anything else to try, I did this. No change; ntpdate process continue to spawn, ntpd won't start, and /var/log/ntpd.log isn't updating. I do note that the reported time is off by -4 hours, which coincidentally enough is my offset from UTC.

Edit: I just upgraded to 20.7, so I'm not sure if I should continue the thread here. But here's what the log looked like after the reboot:

Code Select


Jul 31 09:16:51 opnsense ntpdate[12533]: Can't find host 0.pool.ntp.org: Name does not resolve (8) 
Jul 31 09:16:54 opnsense ntpdate[12533]: Can't find host 2.pool.ntp.org: Name does not resolve (8) 
Jul 31 09:16:56 opnsense ntpdate[12533]: no servers can be used, exiting 
Jul 31 09:17:25 opnsense ntpdate[94626]: Can't find host 0.pool.ntp.org: Name does not resolve (8) 
Jul 31 13:17:54 opnsense ntpdate[94626]: step time server 192.241.146.233 offset +14409.531949 sec 
Jul 31 13:17:54 opnsense ntp[16779]: Successfully synced time after 2 attempts. 
Jul 31 13:17:54 opnsense ntp[78028]: Starting NTP Daemon. 
Jul 31 13:17:54 opnsense ntpd[48610]: ntpd 4.2.8p15@1.3728-o Tue Jul 28 02:25:36 UTC 2020 (1): Starting 
Jul 31 13:17:54 opnsense ntpd[48610]: Command line: /usr/local/sbin/ntpd -g -c /var/etc/ntpd.conf -p /var/run/ntpd.pid 
Jul 31 13:17:54 opnsense ntpd[48610]: ---------------------------------------------------- 
Jul 31 13:17:54 opnsense ntpd[48610]: ntp-4 is maintained by Network Time Foundation, 
Jul 31 13:17:54 opnsense ntpd[48610]: Inc. (NTF), a non-profit 501(c)(3) public-benefit 
Jul 31 13:17:54 opnsense ntpd[48610]: corporation.  Support and training for ntp-4 are 
Jul 31 13:17:54 opnsense ntpd[48610]: available at https://www.nwtime.org/support 
Jul 31 13:17:54 opnsense ntpd[48610]: ---------------------------------------------------- 
Jul 31 13:17:54 opnsense ntpd[55576]: proto: precision = 0.288 usec (-22) 
Jul 31 13:17:54 opnsense ntpd[55576]: basedate set to 2020-07-16 
Jul 31 13:17:54 opnsense ntpd[55576]: gps base set to 2020-07-19 (week 2115) 
Jul 31 13:17:54 opnsense ntpd[55576]: restrict: 'monitor' cannot be disabled while 'limited' is enabled 
Jul 31 13:17:54 opnsense ntpd[55576]: Listen and drop on 0 v6wildcard [::]:123 
Jul 31 13:17:54 opnsense ntpd[55576]: Listen and drop on 1 v4wildcard 0.0.0.0:123 
Jul 31 13:17:54 opnsense ntpd[55576]: Listen normally on 2 igb2 [fe80::208:a2ff:fe0a:d506%3]:123 
Jul 31 13:17:54 opnsense ntpd[55576]: Listen normally on 3 igb2 192.168.1.1:123 
Jul 31 13:17:54 opnsense ntpd[55576]: Listen normally on 4 lo0 [::1]:123 
Jul 31 13:17:54 opnsense ntpd[55576]: Listen normally on 5 lo0 127.0.0.1:123 
Jul 31 13:17:54 opnsense ntpd[55576]: Listen normally on 6 igb2_vlan107 [fe80::208:a2ff:fe0a:d506%9]:123 
Jul 31 13:17:54 opnsense ntpd[55576]: Listen normally on 7 igb2_vlan107 192.168.107.1:123 
Jul 31 13:17:54 opnsense ntpd[55576]: Listening on routing socket on fd #28 for interface updates 
Jul 31 13:18:53 opnsense ntpd[55576]: ntpd exiting on signal 15 (Terminated) 
Jul 31 13:18:53 opnsense ntpd[55576]: 162.159.200.1 local addr 192.168.1.1 -> <null> 
Jul 31 13:18:53 opnsense ntpd[55576]: 23.131.160.7 local addr 192.168.1.1 -> <null> 
Jul 31 13:18:53 opnsense ntpd[55576]: 2600:3c01::f03c:92ff:fe12:add4 local addr fe80::208:a2ff:fe0a:d506%3 -> <null> 
Jul 31 13:18:53 opnsense ntpd[55576]: 90.187.99.165 local addr 192.168.1.1 -> <null>

It looks to me like ntpdate_sync_once.sh completed successfully, ntpd started, and about a minute later, something killed it. And I'm back to ntpdate starting over and over, and nothing at all being written to ntpd.log.

Title: Re: ntpd both running and not running?
Post by: danb35 on August 06, 2020, 01:15:26 PM

Any other ideas? Or since this problem is still present in 20.7, should I post this issue in that forum?

Title: Re: ntpd both running and not running?
Post by: mimugmail on August 06, 2020, 01:49:42 PM

How about new chrony plugin (devel)? Go to CLI and type "pkg install os-chrony-devel".
The just disable ntp and do the rest in chrony .. should be more solid than ntpd and I need some beta testers ;)

Title: Re: ntpd both running and not running?
Post by: danb35 on August 06, 2020, 10:41:34 PM

Since this looks to me like a bug, reported here:
https://github.com/opnsense/core/issues/4245

Title: Re: ntpd both running and not running?
Post by: danb35 on August 06, 2020, 10:42:33 PM

Quote from: mimugmail on August 06, 2020, 01:49:42 PM
How about new chrony plugin (devel)?

I'll consider it, but I'd still like to figure out why this one isn't working. What does Chrony do differently that might make it a better solution?

Title: Re: ntpd both running and not running?
Post by: danb35 on August 13, 2020, 12:57:50 AM

So apparently the official answer (https://twitter.com/opnsense/status/1293624624606072834 - it's on Twitter, so it must be official, right?) is "we don't know what's going on, we don't know how to track it down, and we can't be bothered to try to figure it out." Disappointing.

For some updates--the problem is present in 20.1.9 on a Netgate RCC-VE-2440. It's still there when I upgrade that installation to 20.7. It's also present on a clean install of 20.7 on a Protectli FW4B, with the configuration file imported from the previous system. So maybe it's something that's only affecting me, but it's doing it across two different releases of the software and on two different systems.

So, since this seems to be at a dead end, on to Chrony. It installs just fine. It starts up and appears to run without errors, which puts it ahead of ntpd. I'm noticing what seems to be one bug, though--there's a field in the GUI for "Allowed Networks", but it only lets me enter one network--when I enter a second one, I get a validation error.

Title: Re: ntpd both running and not running?
Post by: franco on August 13, 2020, 01:05:35 PM

> we don't know what's going on, we don't know how to track it down, and we can't be bothered to try to figure it out.

To be frank, it's not what the Twitter response actually says.

Looking at https://blog.fosketts.net/2015/09/21/the-ideal-pfsense-platform-netgate-rcc-ve-2440/ it would seem that "Rangeley dual-core Atom 1.7 GHz CPU" is pretty thin and the Twitter response already states that both Ntpd and OpenVPN issues may be related to a sluggish system performance. We also point to a ticket where the OpenVPN issue is explained and log messages exist to diagnose this particular problem.

I can't spend time on this personally beyond a certain amount of community help as I am neither paid in something to do with OPNsense nor willing to cut short my off-work family time. I'm sorry.

Maybe someone else can. There is also a commercial offering and it is usually pretty good. But I feel that is not the point of this live debugging session anymore.

Cheers,
Franco

Title: Re: ntpd both running and not running?
Post by: mimugmail on August 13, 2020, 03:48:33 PM

Can you Open a Bug Report in GitHub regarding Allowed Networks? I fix this after vacation

OPNsense Forum

Archive => 20.1 Legacy Series => Topic started by: danb35 on July 25, 2020, 05:51:53 PM