OPNsense Forum
Archive => 23.1 Legacy Series => Topic started by: 9axqe on May 17, 2023, 09:44:15 am
-
I can reliably (100% of the time so far) crash opnsense 23.1.7_3-amd64 with a specific ping command from my computer (connected to LAN intf). No traffic can go through then, and when GUI is finally available again, I see "The system is currently booting. Not all services have been started yet."
The command I use (running macOS):
sudo ping6 -G 1508,1410 -D 2001:4860:4860::8844
The effect is immediate for me, it takes a second, then the router crashes.
-
Forgot to mention: I reported the crash via the GUI.
-
I think I remember this one... IPv6 fragmentation across a PBR. The matching rule on the firewall has a gateway set, right?
Cheers,
Franco
-
you're good, I indeed have put a top level firewall rule to "protect" pinging from the other deny rules (for troubleshooting purposes mostly) and it's sending it to a gateway indeed, which sends it to the Wireguard intf – or should send it.
I was trying to troubleshoot some MTU issues on my WAN interface and actually that makes me realise that I need to avoid the gateway if I want to troubleshoot this.
-
I'll try the command later today. The main issue was how to reproduce this quickly so you may have helped out here a lot.
Two things that I'd be interested in if you can help further:
1. What happens if you disable "shared forwarding" on Firewall: Settings: Advanced with the gateway in the rule set?
2. What happens when you don't have a gateway set in the rule? (both shared and non-shared would be interesting)
Thanks,
Franco
-
1. "shared forwarding":
I disabled and reenabled twice, testing in between each time and the issue only happens if shared forwarding is enabled (checkbox is checked). Interestingly, even if "shared forwarding" is disabled, I have 100% packet loss somehow (but opnsense does not crash).
2. gateways:
I simply disabled the gateway to the wireguard intf (the one the ICMPv6 fw rules points to), forcing traffic onto the default gateway to the WAN intf and there is no issue anymore then. "shared forwarding" was enabled. In this case the pings are successful (at least below a certain packet size).
Let me know if that's sufficient for you.
-
Thanks a lot. Based on code not available in FreeBSD 13 (where the issue first appeared) I'm inclined to test this shortcut dropping the bad traffic for the time being:
https://github.com/opnsense/src/commit/5d8cfe7c1eb
# opnsense-update -zkr 23.1.6-refragment
I couldn't reproduce this today due to fighting with prefix delegation inception (not wanting to crash my main box) so I'm not 100% it prevents the panic. Traffic should still be dropped though.
Cheers,
Franco
-
Looks good, I can't get it to crash anymore with this command, even though " Shared forwarding" is enabled. I rebooted the router a couple of times to be sure.
On the dashboard it still reports the version as being "OPNsense 23.1.7_3-amd64" though, is that expected?
-
Yes, only "uname -a" should report something along the lines of "5d8cfe7" which is the commit hash for the build since this only replaced the kernel.
Good enough for me. I will add this to stable/23.1 branch then.
Thanks again,
Franco
-
PS: The "bad" upstream change appears to be https://github.com/opnsense/src/commit/53a4886d5d which broke the pf(4) end assuming ip6_forward() was safe in all cases -- well, at least it was before FreeBSD 13. ;)
https://github.com/freebsd/freebsd-src/commit/b52b61c0b6b appears to fix this, but was never added to stable/13 and doesn't apply cleanly there as well so that's why I went with the commit mentioned earlier.
-
This means that at some point in the future, such packets will work again over opnsense and not be dropped anymore, but at this point in time, it seems there is no clear date or target version as to when this will be possible, is my understanding correct?
-
I might revisit this sooner than later, but I'm unable to use upstream work easily here ETA is in a more or less long-term, correct.
The next fixed version would be the one using FreeBSD 14.1, but 14.0 isn't even out yet so that might be one year away for 24.7 at the earliest. 23.7 plans to move to FreeBSD 13.2 but the problem is the same there (also stable/13 branch without any such fixes).
Cheers,
Franco
-
Thanks, it's not a big deal (now that it does not crash anymore), just wanted to check my understanding.
-
Fair enough, happy to share that information from the release side.
Cheers,
Franco
-
Actually I do have an idea on how to fix this without overcomplicating it. It might not be 100% correct but we could still try to forward/output those packets now being dropped. Want to try a patch?
-
Sure thing. This sense is not "productive" yet (this is for a home office, nothing enterprise-level), I still struggle with some MTU/MSS/segmentation issue which are killing my throughput (I'll start a new thread on that in a while), so I can try a patch, no problem.
And THANKS btw, really amazing work you guys do.
-
All in a days work for bicycle repair man ;)
https://github.com/opnsense/src/commit/1107d69ba909
# opnsense-update -zkr 23.1.6-refragment2
Cheers,
Franco
-
Seems it's working but somehow something with my DNS setup (adguard + unbound) is not working anymore now. Not sure it's related but I can't access the adguard web GUI, which is running on 443 (while opnsense web intf is on another port and is still reachable) and unbound is not starting anymore. I need to look into this.
uname -a returns:
13.1-RELEASE-p7 FreeBSD 13.1-RELEASE-p7 refragment-n250431-1107d69ba90 SMP amd64
"Shared forwarding" is enabled.
IMCP ping and pin6 are routed to gateways, which send traffic to wireguard tunnel
WAN MTU is current set to 1280 bytes
ping6 sweep is successful until 1160 bytes:
sudo ping6 -G 1508,1150 -D 2606:4700:4700::1111
PING6(40+8+[1150...1508] bytes) 2a02:8106:65:84f0:d4bb:7c6a:d30c:3d80 --> 2606:4700:4700::1111
1158 bytes from 2606:4700:4700::1111, icmp_seq=0 hlim=63 time=39.252 ms
1159 bytes from 2606:4700:4700::1111, icmp_seq=1 hlim=63 time=37.917 ms
1160 bytes from 2606:4700:4700::1111, icmp_seq=2 hlim=63 time=34.886 ms
^C
ping is successful up to 1180 bytes:
ping -D -s 1172 quad9.com
PING quad9.com (216.21.3.77): 1172 data bytes
1180 bytes from 216.21.3.77: icmp_seq=0 ttl=49 time=219.187 ms
I don't get how these numbers relate to the MTU value I configured on the interface though.
-
But AdGuard 443 is running locally? You can verify from https://your.fw/ui/diagnostics/portprobe
The other question is if you try to access remotely or from a locally attached LAN?
Cheers,
Franco
-
yes adguard is running locally.
opnsense can connect to itself on port 443 weirdly, I just cannot in the browser:
Connection to 192.168.1.1 443 port [tcp/https] succeeded!
packet capture shows TCP SYN incoming from computer on port 443, and TCP RST sent back, that's it...
-
Maybe not going through the tunnel on the other end? TCP RST might also be a firewall action taken.
Cheers,
Franco
-
Something is wrong, even uninstalling and reinstalling the plugin, which should make it available on port 3000 again, does not work.
I'd like to double check if the patch has nothing to do with it – I doubt it, but just to be sure.
Which command would allow me to revert to 23.1.7_3?
-
Ignore, I found the root cause, my fw rule to block DoH is somehow blocking local access to adguard locally running on opnsense port 3000 – I don't really understand how that's possible, since I can access opnsense web gui on a different port on the same IP and from the same PC just fine. I just disabled it and suddenly I can access Adguard. Weird. It's based on a URL containing a large list of public DoH providers, maybe there's a typo in it, but it still does not explain how it blocks a single port...