[Resolved, sort of ...] OpnSense - no internet

Started by tverweij, October 11, 2023, 02:54:58 PM

Previous topic - Next topic
October 11, 2023, 02:54:58 PM Last Edit: October 11, 2023, 08:51:43 PM by tverweij
I have 3 OpnSense firewalls running - all on 23.7.5, all as virtual machine in Hyper-V 2019 DC.

This morning one of them suddenly stopped working correctly; there is no internet access anymore from any LAN segment. The live view is not displaying any activity anymore, but strange, all VPN's work flawlessly.

As this physical server (the one where OpnSense is not working anymore) is the only one that got the Windows Updates yesterday, I am in the process of uninstalling these updates now.
I hope this solves the problem, I will share the results.

One more point: The firewall itself has internet access, only the LAN segments have not ..

Uninstall of the updates did NOT do the trick - still no internet and no activity in the live log.

Anyone any thoughts?

Check your VM config and verify that it's actually passing traffic to the VM.  If nothing changed in OPNSense and the only machine having issues had Windows Updates, go through and verify everything.  Start with no assumptions of anything working.

No problems found - rolled back the updates, behaviour stays the same.

In the bootlog, I see:
   filter_configure_sync[285] failed.
several times.
On the working firewalls, there is no failed entry in the bootlog.

I do not know what this means, can anyone explain?

Just because windows says it rolled back the updates doesn't necessarily mean that it did.

That error appears to be an issue with a HA setup sync and makes sense if your VM connectivity is broken.  It wouldn't show up on the others because they're able to talk.

You need to verify all of the base details of your VM config and hosting setup.  Without doing that you're just going to be chasing your tail.

VM connectivity is not broken ....

Working flawlessly:
- OpenVPN; clients can connect and run an RDP over this connection
- IPSec (Routing): the other LANS on the other OpnSense firewalls can be accessed normally (intercontinental connection)
- IPSec client connections: working perfectly.
- Ping and lookup from the firewall itself: works normally.

So the routing between the LANs and the VPN adapters work.
The VPN adapters can connect (going over the WAN).

Only routing from LAN to internet fails, routing to anything else (that is also connection over the same WAN) works.

Well, restored the complete machine with a backup from Saturday, and it is working again.
Restoring the config from Saturday did not work.

Changing the backup from weekly to daily now.

It's about 8 hours since the previous firewall failed.
Now the next one fails (other hardware, other continent ....

I know is hard to take time to diagnose when the system is broken and you must restore functionality but are you able to do diagnostics, try to pinpoint -not the root of the problem- but where the problem is?
"...suddenly stopped working correctly; there is no internet access anymore from any LAN segment" is descriptive of the symptom but gives no clue as to what the reason might be. From a distance, could you mean that DNS is not resolving and hence it looks like there is no access? Or, there is actually a problem with routing? For that, the usual network engineer diagnostics are required. Print your routes, your interfaces definitions (ifconfig), your traceroutes, etc.
Also virtualisation is another layer of complexity to account and diagnose for.

October 11, 2023, 10:17:40 PM #10 Last Edit: October 11, 2023, 10:24:07 PM by tverweij
When 30+ people can not work, there is no time to investigate.

But the symptoms are:
- Incoming traffic works
- Outgoing traffic to VPN's work
- Outgoing traffic to the  internet does not.
- Outgoing traffic to the internet from the firewall itself works
-Edit: Traffic between LANS (on the same firewall and on the other side of VPN's) work

When I do a tracert to any IP address, the only adapter that answers is the gateway.
So, indeed DNS does not work as this is on the internet.
But also raw traffic to an IP does not work as the routing stops at the gateway.

Edit2: As I said in another thread:
If some of the developers of Deciso B.V. are reading this, if you contact me I can deliver 2 disks with an installation (HyperV) that have the problem for investigation.

Indeed no time to investigate but this information useful as it is, fails to give enough information to tell what the problem might be. And transplanting the disks does not replicate the environment, so of very limited use.
By the way statements are fine when accompanied by captures of the diagnostic otherwise is open to interpretation.
I would get out of Hyper-v for starters. Not a great hypervisor for freebsd.
Next thing if you can, try bare metal.
After that, consider a commercial support for a couple hours to diagnose if there are no in-house expertise in networking.

October 12, 2023, 03:05:13 PM #12 Last Edit: October 12, 2023, 03:07:39 PM by tverweij
I was thinking of what was changed.

I have 3 firewall, and until now, 2 failed.
Also, I am running OpnSense since the second half of august and did not encounter anything like this before.

Changes made in the last days:
1. Upgraded to 23.7.5 on all three firewalls
2. Working on an incoming SNAT work around

It can't be 1, as only 2 of 3 failed (configurations are comparable).

For 2: Some Port Forwards were changed on the Description field.
The description was: "PortForward: my pf description"
This was changed in: "SNAT x.x.x.x #PortForward: my pf description"

This last change was only done on the 2 firewalls that failed, not on the one that did not fail.

As this description is used to generate a rule on the WAN, this might be the cause (maybe the # character?)

Quote from: cookiemonster on October 11, 2023, 10:32:39 PM
Indeed no time to investigate but this information useful as it is, fails to give enough information to tell what the problem might be. And transplanting the disks does not replicate the environment, so of very limited use.
By the way statements are fine when accompanied by captures of the diagnostic otherwise is open to interpretation.
I would get out of Hyper-v for starters. Not a great hypervisor for freebsd.
Next thing if you can, try bare metal.
After that, consider a commercial support for a couple hours to diagnose if there are no in-house expertise in networking.

Well, I can be simple about that. If the choice comes between OpnSense and HyperV, OpnSense will go.

Have you tried different emulated network interfaces? I don't know what Hyper-V offers, but while paravirtualised does have the least overhead, Intel E1000 is considered the most robust with FreeBSD guests.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)