24.1.2 Wireguard does not work after updating

Started by H3n, February 20, 2024, 06:37:11 PM

Previous topic - Next topic
I also experienced issues after update - Wireguard tunnel works, but routes to other subnets fail (VLANs) - running OPNSense as virtual machine under Proxmox - always take a snapshot before an update - so rolled back to previous version and everything back to normal!!  So something is affecting the Wireguard tunnels and routing . . .

Replying here because I also have a couple of pre-existing Wireguard tunnels.

In my case, upgrading from 24.1.1 --> 24.1.2 went smoothly, without errors, and the WG tunnels continue to work as-normal post upgrade.

My overall setup is 2 SOHO networks at different sites, each with a bare-metal OPNsense install on PCengines APU2E4 box, connected to a separate VLAN-capable switch. One site has WAPs running OpenWRT, the other with Unifi firmware. (Running as pure access points, no routing, firewalling, or services on the WAPs.)

One site also has 4 VLANs and 3 different wifi SSIDs associated w/ 3 of the VLANs. Everything seems to continue to work fine.

You could try reverting this one:

https://github.com/opnsense/core/commit/3340a32473

But it's basically a can of worms because it fixes a non-operational issue on the surface, which points to lack of proper setup if it causes breakage... perhaps meddling with VIPs or a left-over interface IPv4 configuration (this has been discontinued but some old configs may still have it) which is not optimal at the moment.

# opnsense-patch 340a32473


Cheers,
Franco

with multi gateway setup, wg clients, wg servers, vlans.. no problem. i've had vpn stuck at boot only if dns race condition was a problem (e.g. adguard as a main dns; unbound can't resolve if not routed to wan).

Quotei've had vpn stuck at boot only if dns race condition was a problem (e.g. adguard as a main dns; unbound can't resolve if not routed to wan).

I think this could also be the problem for my hang during boot.
However also only with 24.1.2.
I just have unbound, howver with "DNS over TLS" resolving to Cloudflare enabled.

Any way to dive into this? Do I just have to wait for a certain timeout? It seemd to completely stuck at "Configuring Wireguard VPN..." and I was not able to start OPNsense at all...

February 21, 2024, 02:59:02 PM #20 Last Edit: February 21, 2024, 04:35:15 PM by franco
Quote from: franco on February 21, 2024, 02:15:36 PM
You could try reverting this one:

https://github.com/opnsense/core/commit/3340a32473

But it's basically a can of worms because it fixes a non-operational issue on the surface, which points to lack of proper setup if it causes breakage... perhaps meddling with VIPs or a left-over interface IPv4 configuration (this has been discontinued but some old configs may still have it) which is not optimal at the moment.

# opnsense-patch 3340a32473


Cheers,
Franco


I just created the 24.1.1 installation.
I was running OPNsense on bare metal and now switched to Proxmox.
I described the way I did it in this post https://forum.opnsense.org/index.php?topic=38942.msg190682#msg190682.

Anything I can check in my config that could be a potential problem?

February 21, 2024, 03:58:57 PM #21 Last Edit: February 21, 2024, 04:35:22 PM by franco
> Anything I can check in my config that could be a potential problem?

Just revert the patch as stated above. That's enough to diagnose the issue on 24.1.2.

> # opnsense-patch 3340a32473


Cheers,
Franco

# opnsense-patch 340a32473
or
# opnsense-patch 3340a32473

I guess it is the second to fit to the Github link correct? Just to be double-safe....

Yeah, the one that works is preferable. :) Sorry for the typo.

OK. Following behaviour:

1. Updated again to 24.1.2 -> Wireguard did not work.
2. Applied the patch and rebooted. -> Wireguard did not work
3. Restarted Wireguard -> Wireguard worked
4. Reboot again -> Wireguard works

Until now. Everything was checked with my Android phone.

5. Reboot again -> Wireguard does not work on Android. However, iPad works.
A few connects and disconnects with both, Android and iPad. Suddenly both of them are working.


I tested Wireguard with the mobile LTE network but also out of my WLAN. Both showed the same behaviour.
Either both work, or both do not work.

Also both of my tunnes, split and full, showed the same behaviour.

This is difficult to nail down...

Anything that I could test now with the patched 24.1.2 installation?
Otherwise I would revert back to 24.1.1, reinstall 24.1.2 and continue testing to see if it is the same unstable behaviour....

Well it already sounds like a configuration instability. The following bullet points would be helpful:

(1) Do you use DNS entries as endpoint addresses?
(2) Do you use tunnel addresses on your instances?
(3) Do you have allowed IPs on your peers?
(4) Do you have the instances assigned as interfaces?
(5) If yes for (4) do you have an IPv4/IPv6 mode set in the interface?
(6) If yes for (4) do you have VIPs assigned to these interfaces?


Cheers,
Franco

Hello,
I also have some strange problems after the update. I don't want to hijack this thread, but I think it might be the same origin that manifests differently for everyone.

OPNSense A:
Update and direct reboot
Everything seemed to work fine, but later today (day after update) I received error messages that some servers were not reachable - cause a DNS problem. According to the GUI, Unbound was not running - BUT the Internet via browser on the clients was working, so part of the DNS server must have been running. A reboot of OPNSense seemed to have fixed the problem - but I'll have to wait and see tomorrow.

OPNSense B:
Update and direct reboot
- A device can no longer connect to its cloud server.
I can address the device within my internal network (several VLANs routed via OPNSense), so the routing must basically work
- Internet access on my test client worked, websites could be loaded
- a "ping google.de" on the same test client shows no connection
- a "tracert google.de" stops at the OPNSense
- DNS worked, as both of the above commands were able to resolve an IP. I tried it with 3 different hosts, always the same behavior
- a restart of Unbound brought no change
- I checked to see if there was another update available on the OPNSense - the update routine could not connect to the update server either
After rebooting the OPNSense, everything seemed to work again (device had cloud connection, ping worked again, tracert worked again) - I did no other changes!

P.S. My Wireguard worked at least after the second reboot, before that I don't know.

Both OPNSense machines have been running for several years, nothing was changed in the configurations before the update. So it seems that something is sporadically unstable.

February 21, 2024, 06:09:13 PM #27 Last Edit: February 21, 2024, 06:12:31 PM by gstyle
Quote(1) Do you use DNS entries as endpoint addresses?
Yes, I have a dynamic IP, so I have a dyndns domain pointing to my OPNsense router.

Quote(2) Do you use tunnel addresses on your instances?
Yes, this is the entry for the respective instance:    10.21.4.1/24,fd21:04::01/64
And allowed IPs for the peers. For example: 10.21.4.4/32,fd21:04::04/128
This addresses are then in the interface section of the client.

Quote(3) Do you have allowed IPs on your peers?
Yes, different for split and full tunnel:
Full tunnel allowed IPs: 0.0.0.0/0,::/0
Split tunnel allowed IPS: 10.21.0.0/16

Quote(4) Do you have the instances assigned as interfaces?
Yes

Quote(5) If yes for (4) do you have an IPv4/IPv6 mode set in the interface?
IPv4 and IPv6 Configuration Type set to "none"

Quote(6) If yes for (4) do you have VIPs assigned to these interfaces?
No


Reading the questions:
I just realized that I completely forgot about the DynDNS. I mean the time it needs to update.
I was super quick with testing. What a shame, if this would be the reason..... :-[

So I just rolled back to 24.1.1, updated again to 24.1.2 (without the patch).
I will now test again and having a look at the DynDNS topic....



This is potentially a side-effect of something else: A while ago, there were reports of services not starting after a reboot. Franco suspected a race condition.

I experienced something to this extent after upgrading 4 instances on one of them: HAproxy did not start correctly.
This could be fixed by "Reload all services" from the console, a full reboot was not necessary.

In my case, this may have been caused by a slow IPv6 DHCP on my ISPs side on a DS-Lite connection.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 800 up, Bufferbloat A+

QuoteReading the questions:
I just realized that I completely forgot about the DynDNS. I mean the time it needs to update.
I was super quick with testing. What a shame, if this would be the reason..... :-[

So I just rolled back to 24.1.1, updated again to 24.1.2 (without the patch).
I will now test again and having a look at the DynDNS topic....


So...
After a clean update to 24.1.2, a few minutes of just waiting and doing nothing, everyhting works nicely...  :)

So DynDNS could be an explanation....
However, there might have also been something else. Especialle because I was not able to start OPNsense yesterday at all.... no idea....

Thanks for the great support!
Just made a litte PayPal donation the the OPNsense project.