NTP stuck in unsync state after 20.7.5

Ricardo · November 30, 2020, 11:40:06 AM

Quote from: Fright on November 26, 2020, 03:44:46 PM
hm.
can you try to force sync via shell?
Code Select Expand
ntpdate -u 193.225.126.76

I did it:

root@localhost:~ # ntpdate -u 148.6.0.1
30 Nov 11:33:56 ntpdate[4218]: adjust time server 148.6.0.1 offset -0.070479 sec

But the status page still shows:

Unreach/Pending 148.6.0.1 .INIT. 16 u - 512 0 0.000 +0.000 0.000

The GUI: Services: Network Time: Log File: does not show anything regarding ongoing attempts. I have no clue what additional logging should be enabled, I already enabled all of this:

Network time \ General \ Syslog logging:
Enable logging of peer messages (default: disabled). --> CHECKED
Enable logging of system messages (default: disabled). --> CHECKED

Statistics logging These options will create persistent daily log files in /var/log/ntp:
Enable logging of reference clock statistics (default: disabled). --> CHECKED
Enable logging of clock discipline statistics (default: disabled). --> CHECKED
Enable logging of NTP peer statistics (default: disabled). --> CHECKED

Fright · November 30, 2020, 02:40:50 PM

can you restart nptd and take a look at firewall-diagnostics-statesdump (with :123 filter). what source IP ntpd use?
or just try this one?
https://forum.netgate.com/topic/131506/ntp-not-working-solved-totally

Ricardo · December 01, 2020, 11:27:32 AM

I restarted the NTPd several times, thats what can be seen in the logs I added earlier.

But all of a sudden, it recovered "by magic" yesterday @16:00:05 on both of my problematic router:

RouterA:
2020-11-30T16:00:42   ntpd[4570]   148.6.0.1 901a 8a sys_peer
2020-11-30T16:00:32   ntpd[4570]   148.6.0.1 8014 84 reachable
2020-11-30T16:00:32   ntpd[4570]   162.159.200.1 901a 8a sys_peer
2020-11-30T16:00:22   ntpd[4570]   51.105.208.173 8014 84 reachable
2020-11-30T16:00:15   ntpd[4570]   0.0.0.0 0615 05 clock_sync
2020-11-30T16:00:15   ntpd[4570]   185.82.232.254 901a 8a sys_peer
2020-11-30T16:00:10   ntpd[4570]   162.159.200.1 8014 84 reachable
2020-11-30T16:00:05   ntpd[4570]   185.82.232.254 8014 84 reachable

------------------------

RouterB:

2020-11-30T22:12:32   ntpd[63817]   148.6.0.1 961d 8d popcorn 0.000441 s
2020-11-30T15:59:00   ntpd[63817]   0.0.0.0 c615 05 clock_sync
2020-11-30T15:59:00   ntpd[63817]   0.0.0.0 c612 02 freq_set kernel 31.454 PPM
2020-11-30T15:59:00   ntpd[63817]   kernel reports TIME_ERROR: 0x41: Clock Unsynchronized
2020-11-30T15:54:23   ntpd[63817]   51.105.208.173 8014 84 reachable
2020-11-30T15:53:27   ntpd[63817]   148.6.0.1 901a 8a sys_peer
2020-11-30T15:53:21   ntpd[63817]   148.6.0.1 8014 84 reachable
2020-11-30T15:53:21   ntpd[63817]   0.0.0.0 c618 08 no_sys_peer
2020-11-30T15:53:16   ntpd[63817]   0.0.0.0 0614 04 freq_mode
2020-11-30T15:53:23   ntpd[63817]   0.0.0.0 061c 0c clock_step -6.165238 s
2020-11-30T15:53:23   ntpd[63817]   51.105.208.173 901a 8a sys_peer
2020-11-30T15:53:13   ntpd[63817]   51.105.208.173 8014 84 reachable
2020-11-26T10:11:32   ntpd[63817]   51.105.208.173 8011 81 mobilize assoc 43919
-------------------------

I did not change anything in the config around that time. These 2 routers are connected through a site-2-site IPSEC VPN, that may be the only single clue I can think of what could cause this NTP time sync issue (NTP peers were partially different on the 2 routers, so I would rather rule out that the issue was on the remote time source).

Fright · December 01, 2020, 03:57:45 PM

glad it works )
the only way I managed to reproduce your behavior (no sync. log stop at "..mobilize assoc ..") is to unbind the ntpd from the interfaces with Internet access (wan).
so i think it has to do with connectivity from the selected interfaces. it's hard to say what exactly interfered with connections (next time you can try to trace the packets, check nat rules etc).

Ricardo · December 01, 2020, 06:24:52 PM

Quote from: Fright on December 01, 2020, 03:57:45 PM
glad it works )
the only way I managed to reproduce your behavior (no sync. log stop at "..mobilize assoc ..") is to unbind the ntpd from the interfaces with Internet access (wan).
so i think it has to do with connectivity from the selected interfaces. it's hard to say what exactly interfered with connections (next time you can try to trace the packets, check nat rules etc).

I dont like these automagical resolutions, as I have no idea what fixed it, and when (not if, but when..) it comes back again, I will stuck at the same situation.

The NTPd is set to listen on "all" interface. To be more precise, in the GUI it looks like there is "nothing" selected under the NTPd listening interface, so it means it listens on everything.
https://docs.opnsense.org/manual/ntpd.html --> Interface(s): "Interfaces to bind to, when none is selected it listens to all"

Slightly offtopic:
Franco said somewhere in this forum earlier that this default (=listen on ALL interface) should NEVER be changed for Unbound DNS, otherwise it will cause a big clusterf*ck of issues. So I was curious not to change this for NTPd either. But unfortunately this is not in the docs in any written and discoverable form (for the Unbound DNS, the docs dont warn the user either: https://docs.opnsense.org/manual/unbound.html --> "Network Interfaces:
Interface IP addresses used for responding to queries from clients. If an interface has both IPv4 and IPv6 IPs, both are used. Queries to other interface IPs not selected are discarded. The default behavior is to respond to queries on every available IPv4 and IPv6 address." --> No warning about "do NOT mess with listening interface setup, otherwise you break the whole world!"

So back to NTPd: I saw in the firewall status Live view, as the router sent out NTP packets (UDP123) towards the resolved public IPs of the time servers, using the WAN interface, using the actual WAN IP at the time of writing. So that looked also correct to me.

Fright · December 01, 2020, 09:13:39 PM

Quotedo NOT mess with listening interface setup, otherwise you break the whole world!

I agree. that's why I mentioned that I got the error only by changing the bindings.

QuoteI saw in the firewall status Live view, as the router sent out NTP packets (UDP123) towards the resolved public IPs of the time servers, using the WAN interface, using the actual WAN IP at the time of writing. So that looked also correct to me.

well. it remains to wait for it to happen again and try to figure it out on the second try )
i will test the log detail with the "logconfig=all" parameter in .conf. maybe useful in the future

chemlud · December 02, 2020, 08:20:10 AM

Quote from: Ricardo on December 01, 2020, 06:24:52 PM
Slightly offtopic:
Franco said somewhere in this forum earlier that this default (=listen on ALL interface) should NEVER be changed for Unbound DNS, otherwise it will cause a big clusterf*ck of issues. So I was curious not to change this for NTPd either. But unfortunately this is not in the docs in any written and discoverable form (for the Unbound DNS, the docs dont warn the user either: https://docs.opnsense.org/manual/unbound.html --> "Network Interfaces:
Interface IP addresses used for responding to queries from clients. If an interface has both IPv4 and IPv6 IPs, both are used. Queries to other interface IPs not selected are discarded. The default behavior is to respond to queries on every available IPv4 and IPv6 address." --> No warning about "do NOT mess with listening interface setup, otherwise you break the whole world!"

I excluded WAN from unbound listen interfaces long time ago. No hell broke loose ever since regarding DNS. Iirc the problem was, if an interface is removed that is listed specifically.

Fright · December 02, 2020, 08:30:08 AM

@Ricardo
so I tried and I don't know how to make the ntpd write more messages to the log (I tried both command line options and options in .conf).
what is written with the gui options enabled is the most..
it looks like next time it remains to rely on firewall logs and packet capture to figure out what is going on

Ricardo · December 02, 2020, 02:08:28 PM

Quote from: Fright on December 02, 2020, 08:30:08 AM
@Ricardo
so I tried and I don't know how to make the ntpd write more messages to the log (I tried both command line options and options in .conf).
what is written with the gui options enabled is the most..
it looks like next time it remains to rely on firewall logs and packet capture to figure out what is going on

Is the same NTPd software more chattier (more verbose) on other BSD / Linux platforms, if you want it to be so?

Fright · December 02, 2020, 03:17:05 PM

to be honest, no idea. but did not googled a single example of a log with more verbose output than the one that is on the opn (with all flags on GUI config).

NTP stuck in unsync state after 20.7.5

Ricardo

November 30, 2020, 11:40:06 AM #15 Last Edit: November 30, 2020, 12:23:41 PM by Ricardo

Fright

November 30, 2020, 02:40:50 PM #16

Ricardo

December 01, 2020, 11:27:32 AM #17 Last Edit: December 01, 2020, 11:46:52 AM by Ricardo

Fright

December 01, 2020, 03:57:45 PM #18

Ricardo

December 01, 2020, 06:24:52 PM #19

Fright

December 01, 2020, 09:13:39 PM #20

chemlud

December 02, 2020, 08:20:10 AM #21

Fright

December 02, 2020, 08:30:08 AM #22

Ricardo

December 02, 2020, 02:08:28 PM #23

Fright

December 02, 2020, 03:17:05 PM #24