Unbound does not resolve queries after reboot (broken default GW selection)

Started by klosz007, January 28, 2023, 10:10:45 PM

Previous topic - Next topic
Hi again,

After tedious investigation I found root cause of this but I still believe it is bug (or change in behavior at least...) in 22.7.10+.

I realized that upon each reboot, I am not able to ping anything from OPNsense for some time (time is random/varies with each reboot, can be as short as 30 seconds, can be las long as 2 minutes). I cannot ping anything in my two LANs, I cannot ping my WAN LTE router, not mentioning Internet.

Any attempt to run ping from firewall results in this weird message:

ping: sendto: permission denied

This has consequences - monitoring of both WAN gateways fails so they are considered "down", hence OpenVPN gateway is selected as default gateway, Without proper default gateway, Unbound cannot contact root servers so it cannot start up.

After that "some time", pings eventually start to work, then both WAN GW's recover and come online, finally my primary WAN/DSL GW becomes active one. Unfortunately Unbound cannot self-recover from this situation (prolonged lack of access to root servers) and has to be manually restarted, then it comes up instantly.


So why is this happening ? I realized that if I reset my OPNsense config to default and go through at least minimal setup then this issue is not there. So it must be something in my (rather complex) configuration.

I took me a lot of time to find this and it all seems to be related to shaper. I have created shaper config (quite simple setup, like the one described here: https://maltechx.de/en/2021/03/opnsense-setup-traffic-shaping-and-reduce-bufferbloat) a long time ago, to resolve bufferbloat issues and have not been changing/touching this ever since.

It is sufficent to just create and enable pipes, without queues or rules even created for this weird issue to appear in 22.7.10+.
If I delete (or disable) pipes, everything goes back to normal. As soon they get created and enabled (without queues/rules even created or enabled), it instantly returns - it is not possible to ping anything after each reboot.
And as said previously, it does not happen in 22.7.9, all version starting from 22.7.10 are affected for me.

So for the moment I just disabled shaper and I'm on 23.1.6 and all is fine.

Any ideas why this is happening ?

Maybe I was lucky with 22.7.* as I never had any issues.

23.1 however, is a different story. My computers all pass traffic through OPNsense, but pinging domain names from the OPNsense interface, has no such luck. Why OPNsense can ping the IP in the audit below and get a reply is puzzling.

Do I need to revert back to 22.*?

***GOT REQUEST TO AUDIT CONNECTIVITY***
Currently running OPNsense 23.1_6 at Sun Feb 12 15:36:29 CST 2023
Checking connectivity for host: pkg.opnsense.org -> 89.149.211.205
PING 89.149.211.205 (89.149.211.205): 1500 data bytes
1508 bytes from 89.149.211.205: icmp_seq=0 ttl=46 time=130.093 ms
1508 bytes from 89.149.211.205: icmp_seq=1 ttl=46 time=119.261 ms
1508 bytes from 89.149.211.205: icmp_seq=2 ttl=46 time=129.910 ms
1508 bytes from 89.149.211.205: icmp_seq=3 ttl=46 time=252.718 ms

--- 89.149.211.205 ping statistics ---
4 packets transmitted, 4 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 119.261/157.996/252.718/54.863 ms
Checking connectivity for repository (IPv4): https://pkg.opnsense.org/FreeBSD:13:amd64/23.1
Updating OPNsense repository catalogue...
pkg: https://pkg.opnsense.org/FreeBSD:13:amd64/23.1/latest/meta.txz: No address record
repository OPNsense has no meta file, using default settings
pkg: https://pkg.opnsense.org/FreeBSD:13:amd64/23.1/latest/packagesite.pkg: No address record
pkg: https://pkg.opnsense.org/FreeBSD:13:amd64/23.1/latest/packagesite.txz: No address record
Unable to update repository OPNsense
Error updating repositories!
Checking connectivity for host: pkg.opnsense.org -> 2001:1af8:4f00:a005:5::
ping: UDP connect: No route to host
Checking connectivity for repository (IPv6): https://pkg.opnsense.org/FreeBSD:13:amd64/23.1
Updating OPNsense repository catalogue...
pkg: https://pkg.opnsense.org/FreeBSD:13:amd64/23.1/latest/meta.txz: Non-recoverable resolver failure
repository OPNsense has no meta file, using default settings
pkg: https://pkg.opnsense.org/FreeBSD:13:amd64/23.1/latest/packagesite.pkg: Non-recoverable resolver failure
pkg: https://pkg.opnsense.org/FreeBSD:13:amd64/23.1/latest/packagesite.txz: Non-recoverable resolver failure
Unable to update repository OPNsense
Error updating repositories!
***DONE***

Some of this reboot-related breakage might be https://github.com/opnsense/core/commit/6d22e7b68a2a fixed last week and going into 23.1.1.

# opnsense-patch 6d22e7b68a2a

It looks like this is influenced by FreeBSD 13 since 22.1 on top of one security fix and one reliability fix both in 22.7 which surfaced the problem when default gateway switching was enabled and the gateway chosen by the default gateway switching was a static address...


Cheers,
Franco


Simple answer: yesterday, perhaps?

But maybe the question wasn't precise enough regarding which "patch". The commit as published above? A release? A patch for a different issue? A follow up patch for the patch for "reasons"?


Cheers,
Franco

So to skip your smartassery, has that patch been applied to the lease, or not yet? The version hasn't changed to indicate something has been fixed, unless it's too small to warrant such..?

I'm not sure there is anything to add here that can't be read from my last two replies. ;)

And if you want to report a new bug feel free, but don't use and derail an ongoing discussion.

Your behaviour has been noted and so consider this as a warning to remain civil.


Cheers,
Franco

Pretty confident we cleared the air via PM. The question here still stands: 23.1.1 behaving better in this regard or not? If not it might be better to add a fresh Github issue to add technical depth and hopefully steps to reproduce.


Cheers,
Franco