I recently installed OPNsense (currently on 20.1.9), and it's working well, though with a few oddities. One of them is that, at least from what the web UI shows, the ntpd service won't start. The dashboard shows:
(see attachment)
...and in what's probably the relevant section, the log shows:
(see other attachment)
And indeed, there is a process listening on *:123, but it's ntpdate, not ntpd:
root@opnsense:~ # sockstat -l | grep :123
root ntpdate 77157 4 udp6 *:123 *:*
root ntpdate 77157 5 udp4 *:123 *:*
root@opnsense:~ # service ntpdate status
Cannot 'status' ntpdate. Set ntpdate_enable to YES in /etc/rc.conf or use 'onestatus' instead of 'status'.
root@opnsense:~ # service ntpdate onestatus
ntpdate is not running.
Something's strange--what should the next steps be to track this down?
A little more searching has found https://forum.opnsense.org/index.php?topic=9980.0 in German, which seems to be about the same problem. Like in that thread, I find that the PID of the ntpdate process is constantly changing. I likewise have nothing relevant in /etc/rc.conf, and don't have rc.conf.local at all.
Edit: When I run ps aux, I see this in the list:
root 15508 0.0 0.1 1058416 3136 - Ss 12:58 0:00.01 /bin/sh /usr/local/etc/inc/plugins.inc.d/ntpd/ntpdate_sync_once.
I'm having a little trouble parsing that script, but from what I can tell it's supposed to try three times to sync and then give up if it failed. But the system uptime is a couple of days, it's still coming up in the "ps aux" listing, and new ntpdate processes keep getting started.
Interestingly, there's nothing at all in the ntpd log since some time yesterday, even though I'd tried to start the service again since then.
Anyone?
Hi there,
ntpdate_sync_once.sh is an historic (and arcane) bit of software that defers NTP daemon start and sets the right time through ntpdate before starting NTP daemon itself. Some of this stems from the fact that NTP daemon may only slowly update time or fail to do so if drift is too wide.
Problem is it looks "broken" once you startup the device or after applying interface settings etc., but in the end ntpdate finishes and ntpd starts properly without further need to handle it.
Cheers,
Franco
Quote
in the end ntpdate finishes and ntpd starts properly without further need to handle it.
So how long does this take? Because this system is now at a week of uptime, and repeated ntpdate processes are still spawning, blocking ntpd from starting.
It should take 30 seconds. If it takes longer it indicates possible connectivity problems with ntpdate?
Cheers,
Franco
Quote from: franco on July 29, 2020, 01:29:16 PM
If it takes longer it indicates possible connectivity problems with ntpdate?
Don't know about that:
root@opnsense:~ # ntpdate -q pool.ntp.org
server 64.225.34.103, stratum 2, offset 0.279882, delay 0.10892
server 162.159.200.123, stratum 3, offset 0.290670, delay 0.04358
server 209.50.63.74, stratum 4, offset 0.290167, delay 0.10580
server 138.68.46.177, stratum 2, offset 0.288142, delay 0.10635
29 Jul 07:30:11 ntpdate[23722]: adjust time server 138.68.46.177 offset 0.288142 sec
But still, ntpdate processes keep starting:
root@opnsense:~ # sockstat -l | grep :123
root ntpdate 35153 4 udp6 *:123 *:*
root ntpdate 35153 5 udp4 *:123 *:*
root@opnsense:~ # sockstat -l | grep :123
root@opnsense:~ # sockstat -l | grep :123
root ntpdate 14746 4 udp6 *:123 *:*
root ntpdate 14746 5 udp4 *:123 *:*
root@opnsense:~ # sockstat -l | grep :123
root ntpdate 31092 4 udp6 *:123 *:*
root ntpdate 31092 5 udp4 *:123 *:*
I could, of course, try to reboot the system--but that sounds like an awfully Windows-y troubleshooting step.
Something in this loop isn't quite right then:
https://github.com/opnsense/core/blob/master/src/etc/inc/plugins.inc.d/ntpd/ntpdate_sync_once.sh#L13
Would be interested in the log:
# clog /var/log/ntpd.log | grep attempts
Cheers,
Franco
root@opnsense:~ # clog /var/log/ntpd.log | grep attempts
Jul 21 16:11:25 opnsense ntp: Giving up on time sync after 3 attempts.
Jul 21 17:28:14 opnsense ntp: Successfully synced time after 1 attempts.
Jul 21 17:36:33 opnsense ntp: Successfully synced time after 1 attempts.
Jul 21 17:38:57 opnsense ntp: Successfully synced time after 1 attempts.
Jul 21 17:41:13 opnsense ntp: Successfully synced time after 1 attempts.
Jul 21 18:50:34 opnsense ntp: Successfully synced time after 1 attempts.
Jul 21 18:52:43 opnsense ntp: Successfully synced time after 1 attempts.
It's probably noteworthy, though, that there's nothing at all in ntpd.log since 24 July:
Jul 23 10:38:29 opnsense ntpdate[33552]: 2001:470:88f6:: rate limit response from server.
Jul 23 10:38:39 opnsense ntpdate[33552]: 2001:19f0:5:daf::2 rate limit response from server.
Jul 23 13:19:38 opnsense ntpdate[89457]: 12.167.151.1 rate limit response from server.
Jul 23 14:36:34 opnsense ntpdate[56908]: 64.225.34.103 rate limit response from server.
Jul 24 01:08:58 opnsense ntpdate[71733]: 142.147.92.5 rate limit response from server.
Jul 24 04:17:43 opnsense ntpdate[73287]: 206.55.191.142 rate limit response from server.
Jul 24 05:09:01 opnsense ntpdate[24693]: 64.142.54.12 rate limit response from server.
Jul 24 06:50:46 opnsense ntpdate[47278]: 192.111.144.114 rate limit response from server.
Jul 24 06:50:56 opnsense ntpdate[6556]: 192.111.144.114 rate limit response from server.
CLOG}???
Ok, let's see what's wrong...
# sh -x /usr/local/etc/inc/plugins.inc.d/ntpd/ntpdate_sync_once.sh
The script isn't handling ntpdate being stuck very well. Might be enough to run the following to confirm:
# killall ntpdate
Cheers,
Franco
root@opnsense:~ # sh -x /usr/local/etc/inc/plugins.inc.d/ntpd/ntpdate_sync_once.sh
+ NOTSYNCED=true
+ MAX_ATTEMPTS=3
+ /bin/cat /conf/config.xml
+ /usr/bin/grep timeservers
+ /usr/bin/cut '-d>' -f2
+ /usr/bin/cut '-d<' -f1
+ SERVER='0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'
+ [ '0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org' '=' '' ]
+ /bin/pkill -f ntpdate_sync_once.sh
+ ATTEMPT=1
+ [ true '=' true ]
+ [ 1 -le 3 ]
+ [ true ]
+ /usr/bin/killall ntpdate
+ /bin/pgrep ntpd
+ [ 1 -eq 0 ]
+ break
+ sleep 1
Terminated
Edit:
root@opnsense:~ # killall ntpdate
root@opnsense:~ # sh -x /usr/local/etc/inc/plugins.inc.d/ntpd/ntpdate_sync_once.sh
+ NOTSYNCED=true
+ MAX_ATTEMPTS=3
+ /bin/cat /conf/config.xml
+ /usr/bin/cut '-d>' -f2
+ /usr/bin/cut '-d<' -f1
+ /usr/bin/grep timeservers
+ SERVER='0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org'
+ [ '0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org' '=' '' ]
+ /bin/pkill -f ntpdate_sync_once.sh
+ ATTEMPT=1
+ [ true '=' true ]
+ [ 1 -le 3 ]
+ [ true ]
+ /usr/bin/killall ntpdate
+ /bin/pgrep ntpd
+ [ 1 -eq 0 ]
+ break
+ sleep 1
+ /usr/local/sbin/ntpdate -s -t 5 0.pool.ntp.org 1.pool.ntp.org 2.pool.ntp.org 3.pool.ntp.org
Terminated
root@opnsense:~ #
PS: I hate this forum software! Why does it return you to the forum index once you save a post?
Do you see segmentation faults in "dmesg" output?
root@opnsense:~ # dmesg | grep -i fault
root@opnsense:~ #
grep "core dumped" please, looks like it doesn't use "fault" in the message, e.g.
pid 63091 (syslog-ng), jid 0, uid 0: exited on signal 11 (core dumped)
Cheers,
Franco
root@opnsense:~ # dmesg | grep -i "core dumped"
root@opnsense:~ #
Quote from: danb35 on July 29, 2020, 01:33:45 PM
I could, of course, try to reboot the system--but that sounds like an awfully Windows-y troubleshooting step.
Well, not having anything else to try, I did this. No change; ntpdate process continue to spawn, ntpd won't start, and /var/log/ntpd.log isn't updating. I do note that the reported time is off by -4 hours, which coincidentally enough is my offset from UTC.
Edit: I just upgraded to 20.7, so I'm not sure if I should continue the thread here. But here's what the log looked like after the reboot:
Jul 31 09:16:51 opnsense ntpdate[12533]: Can't find host 0.pool.ntp.org: Name does not resolve (8)
Jul 31 09:16:54 opnsense ntpdate[12533]: Can't find host 2.pool.ntp.org: Name does not resolve (8)
Jul 31 09:16:56 opnsense ntpdate[12533]: no servers can be used, exiting
Jul 31 09:17:25 opnsense ntpdate[94626]: Can't find host 0.pool.ntp.org: Name does not resolve (8)
Jul 31 13:17:54 opnsense ntpdate[94626]: step time server 192.241.146.233 offset +14409.531949 sec
Jul 31 13:17:54 opnsense ntp[16779]: Successfully synced time after 2 attempts.
Jul 31 13:17:54 opnsense ntp[78028]: Starting NTP Daemon.
Jul 31 13:17:54 opnsense ntpd[48610]: ntpd 4.2.8p15@1.3728-o Tue Jul 28 02:25:36 UTC 2020 (1): Starting
Jul 31 13:17:54 opnsense ntpd[48610]: Command line: /usr/local/sbin/ntpd -g -c /var/etc/ntpd.conf -p /var/run/ntpd.pid
Jul 31 13:17:54 opnsense ntpd[48610]: ----------------------------------------------------
Jul 31 13:17:54 opnsense ntpd[48610]: ntp-4 is maintained by Network Time Foundation,
Jul 31 13:17:54 opnsense ntpd[48610]: Inc. (NTF), a non-profit 501(c)(3) public-benefit
Jul 31 13:17:54 opnsense ntpd[48610]: corporation. Support and training for ntp-4 are
Jul 31 13:17:54 opnsense ntpd[48610]: available at https://www.nwtime.org/support
Jul 31 13:17:54 opnsense ntpd[48610]: ----------------------------------------------------
Jul 31 13:17:54 opnsense ntpd[55576]: proto: precision = 0.288 usec (-22)
Jul 31 13:17:54 opnsense ntpd[55576]: basedate set to 2020-07-16
Jul 31 13:17:54 opnsense ntpd[55576]: gps base set to 2020-07-19 (week 2115)
Jul 31 13:17:54 opnsense ntpd[55576]: restrict: 'monitor' cannot be disabled while 'limited' is enabled
Jul 31 13:17:54 opnsense ntpd[55576]: Listen and drop on 0 v6wildcard [::]:123
Jul 31 13:17:54 opnsense ntpd[55576]: Listen and drop on 1 v4wildcard 0.0.0.0:123
Jul 31 13:17:54 opnsense ntpd[55576]: Listen normally on 2 igb2 [fe80::208:a2ff:fe0a:d506%3]:123
Jul 31 13:17:54 opnsense ntpd[55576]: Listen normally on 3 igb2 192.168.1.1:123
Jul 31 13:17:54 opnsense ntpd[55576]: Listen normally on 4 lo0 [::1]:123
Jul 31 13:17:54 opnsense ntpd[55576]: Listen normally on 5 lo0 127.0.0.1:123
Jul 31 13:17:54 opnsense ntpd[55576]: Listen normally on 6 igb2_vlan107 [fe80::208:a2ff:fe0a:d506%9]:123
Jul 31 13:17:54 opnsense ntpd[55576]: Listen normally on 7 igb2_vlan107 192.168.107.1:123
Jul 31 13:17:54 opnsense ntpd[55576]: Listening on routing socket on fd #28 for interface updates
Jul 31 13:18:53 opnsense ntpd[55576]: ntpd exiting on signal 15 (Terminated)
Jul 31 13:18:53 opnsense ntpd[55576]: 162.159.200.1 local addr 192.168.1.1 -> <null>
Jul 31 13:18:53 opnsense ntpd[55576]: 23.131.160.7 local addr 192.168.1.1 -> <null>
Jul 31 13:18:53 opnsense ntpd[55576]: 2600:3c01::f03c:92ff:fe12:add4 local addr fe80::208:a2ff:fe0a:d506%3 -> <null>
Jul 31 13:18:53 opnsense ntpd[55576]: 90.187.99.165 local addr 192.168.1.1 -> <null>
It looks to me like ntpdate_sync_once.sh completed successfully, ntpd started, and about a minute later, something killed it. And I'm back to ntpdate starting over and over, and nothing at all being written to ntpd.log.
Any other ideas? Or since this problem is still present in 20.7, should I post this issue in that forum?
How about new chrony plugin (devel)? Go to CLI and type "pkg install os-chrony-devel".
The just disable ntp and do the rest in chrony .. should be more solid than ntpd and I need some beta testers ;)
Since this looks to me like a bug, reported here:
https://github.com/opnsense/core/issues/4245
Quote from: mimugmail on August 06, 2020, 01:49:42 PM
How about new chrony plugin (devel)?
I'll consider it, but I'd still like to figure out why this one isn't working. What does Chrony do differently that might make it a better solution?
So apparently the official answer (https://twitter.com/opnsense/status/1293624624606072834 - it's on Twitter, so it must be official, right?) is "we don't know what's going on, we don't know how to track it down, and we can't be bothered to try to figure it out." Disappointing.
For some updates--the problem is present in 20.1.9 on a Netgate RCC-VE-2440. It's still there when I upgrade that installation to 20.7. It's also present on a clean install of 20.7 on a Protectli FW4B, with the configuration file imported from the previous system. So maybe it's something that's only affecting me, but it's doing it across two different releases of the software and on two different systems.
So, since this seems to be at a dead end, on to Chrony. It installs just fine. It starts up and appears to run without errors, which puts it ahead of ntpd. I'm noticing what seems to be one bug, though--there's a field in the GUI for "Allowed Networks", but it only lets me enter one network--when I enter a second one, I get a validation error.
> we don't know what's going on, we don't know how to track it down, and we can't be bothered to try to figure it out.
To be frank, it's not what the Twitter response actually says.
Looking at https://blog.fosketts.net/2015/09/21/the-ideal-pfsense-platform-netgate-rcc-ve-2440/ it would seem that "Rangeley dual-core Atom 1.7 GHz CPU" is pretty thin and the Twitter response already states that both Ntpd and OpenVPN issues may be related to a sluggish system performance. We also point to a ticket where the OpenVPN issue is explained and log messages exist to diagnose this particular problem.
I can't spend time on this personally beyond a certain amount of community help as I am neither paid in something to do with OPNsense nor willing to cut short my off-work family time. I'm sorry.
Maybe someone else can. There is also a commercial offering and it is usually pretty good. But I feel that is not the point of this live debugging session anymore.
Cheers,
Franco
Can you Open a Bug Report in GitHub regarding Allowed Networks? I fix this after vacation