OPNsense Forum

English Forums => General Discussion => Topic started by: patrick3000 on November 22, 2023, 07:50:56 AM

Title: NTP not working (causes problems)
Post by: patrick3000 on November 22, 2023, 07:50:56 AM
I have OPNsense installed as a VM running on top of something similar to Proxmox (Truenas SCALE actually, but you can consider it Proxmox because for these purposes, it's almost the same thing since it's KVM in Debian).

Recently, it's been acting strange and crashed randomly once. Upon further investigation, I noticed that the NTP service is not working, so I don't think OPNsense can get the time accurately. The status of all servers is "Unreach/Pending," and in the NTP log file, there are a bunch of entries saying "unable to bind to wildcard address :: - another process may be running - EXITING."

So, it appears that OPNsense is not able to access any NTP server and cannot get the time, which is probably the source of the problems. However, when I manually queried one of the servers in the pool from the Shell with "ntpdate -q 0.opnsense.pool.ntp.org" I got the result "server 167.248.62.201, stratum 3, offset +0.000000, delay 0.04958" (and some other stuff).

So, it seems that the NTP servers can be reached manually, but for some reason, the NTP service in OPNsense isn't working properly.

I'm considering reinstalling and restore from config, but I'd rather not because there are a few VLANs and other interfaces, and matching everything up will take some work.

Any thoughts on what could be causing NTP not to work and how I can troubleshoot this?
Title: Re: NTP not working (causes problems)
Post by: patrick3000 on November 22, 2023, 08:48:13 AM
This appears to be a false alarm. For some reason, NTP is working again with an active peer. I'm not sure why, but I deleted some servers in the pool a couple of hours ago and returned to the four default servers, and maybe that fixed it.

I'll check again tomorrow and make sure there is still an active peer, but for now this appears to be solved.
Title: Re: NTP not working (causes problems)
Post by: CJ on November 22, 2023, 09:17:05 PM
Check what clock OPNSense is using.  If the clock drifts too far NTP will just give up and stop syncing.

I'm not a fan of virtual OPNSense so I'm not sure what settings you should use.
Title: Re: NTP not working (causes problems)
Post by: patrick3000 on November 23, 2023, 08:21:49 AM
I have learned more about this. In particular, it appears that when a network adapter is passed to OPNsense VM using PCI pass-through, and that adapter is then assigned an interface (such as WAN) that is used for accessing an NTP server, there can be problems getting NTP to work. During boot, the following message appears in the console, "Statistical lapic calibration failed! Clocks might be ticking at variable rates." NTP is then unable to get an active peer.

When the adapter to be used for accessing NTP servers is passed through to OPNsense as a virtual adapter, rather than using PCI pass-through, this problem does not occur.

This is in Truenas SCALE, but it also might be an issue in Proxmox since they both use KVM on top of Debian.

In any event, I have solved the problem by using virtual adapter pass-through rather than PCI pass-through but thought it would be worth documenting in case anyone else has this problem.
Title: Re: NTP not working (causes problems)
Post by: meyergru on November 23, 2023, 12:48:58 PM
I doubt that you identified the root cause of the problem, even when your fix apparently works.

There are many potential problems with timekeeping on virtual machines, most of which are covered in this excellent document (https://www.vmware.com/files/pdf/techpaper/Timekeeping-In-VirtualMachines.pdf). You can find more FreeBSD specific information here (https://www.thomas-krenn.com/de/wiki/FreeBSD_Timecounters).

The error message you saw is pointing towards the local APIC calibration going wrong. This in turn may be influenced by the VM configuration, but depending on the CPU pressure on your host, this problem may turn up anyway.

I think (i.e. do not know for sure) that NTPD also does some basic checking if the local timekeeping is sane. If it decides that it is not, it will exit / not synchronize.

The are other threads (https://forum.opnsense.org/index.php?topic=18557) suggesting that for FreeBSD, you can manually set which clock source(s) are used by changing kern.timecounter.hardware (https://man.freebsd.org/cgi/man.cgi?query=timecounters&sektion=4&apropos=0&manpath=FreeBSD+13.1-RELEASE+and+Ports) to something more reliable from the range of kern.timecounter.choice. Probably, in a situation when ntpd works for you, the kernel has already decided for a better clock source which you could set in tuneables, thus potentially avoiding the need for a virtualized network adapter. You can also lower the virtual overhead by setting kern.hz=100.
Title: Re: NTP not working (causes problems)
Post by: patrick3000 on November 23, 2023, 06:47:51 PM
meyergru thanks. That's helpful. I will read the documents on timekeeping that you linked to. Overall I have been very pleased switching from OPNsense bare metal to OPNsense virtual, which I did a couple of months ago. Less hardware to maintain. Less shelf space used. Less energy consumed. Less cabling. But it does appear that there are some tricky issues with timekeeping to deal with.
Title: Re: NTP not working (causes problems)
Post by: patrick3000 on November 23, 2023, 06:57:04 PM
Fyi, the link to timekeeping in FreeBSD is in a language other than English. Not sure whether it's German or Dutch, but I can't read it.
Title: Re: NTP not working (causes problems)
Post by: chemlud on November 23, 2023, 07:15:06 PM
Quote from: patrick3000 on November 23, 2023, 06:57:04 PM
Fyi, the link to timekeeping in FreeBSD is in a language other than English. Not sure whether it's German or Dutch, but I can't read it.

....https://www.thomas-krenn.com/de/wiki/FreeBSD_Timecounters# is German

Latest Firefox has a (beta) translation function, works pretty well, as far as I can see...
Title: Re: NTP not working (causes problems)
Post by: patrick3000 on November 23, 2023, 07:43:09 PM
Thanks. Firefox translation worked well.
Title: Re: NTP not working (causes problems)
Post by: CJ on November 26, 2023, 06:11:45 PM
Quote from: meyergru on November 23, 2023, 12:48:58 PM
I doubt that you identified the root cause of the problem, even when your fix apparently works.

There are many potential problems with timekeeping on virtual machines, most of which are covered in this excellent document (https://www.vmware.com/files/pdf/techpaper/Timekeeping-In-VirtualMachines.pdf). You can find more FreeBSD specific information here (https://www.thomas-krenn.com/de/wiki/FreeBSD_Timecounters).

The error message you saw is pointing towards the local APIC calibration going wrong. This in turn may be influenced by the VM configuration, but depending on the CPU pressure on your host, this problem may turn up anyway.

I think (i.e. do not know for sure) that NTPD also does some basic checking if the local timekeeping is sane. If it decides that it is not, it will exit / not synchronize.

The are other threads (https://forum.opnsense.org/index.php?topic=18557) suggesting that for FreeBSD, you can manually set which clock source(s) are used by changing kern.timecounter.hardware (https://man.freebsd.org/cgi/man.cgi?query=timecounters&sektion=4&apropos=0&manpath=FreeBSD+13.1-RELEASE+and+Ports) to something more reliable from the range of kern.timecounter.choice. Probably, in a situation when ntpd works for you, the kernel has already decided for a better clock source which you could set in tuneables, thus potentially avoiding the need for a virtualized network adapter. You can also lower the virtual overhead by setting kern.hz=100.

That's what I was referring to about checking the clock.  Powerd was causing mine to get out of sync until I changed it using the timecounter tuneable.