Traceroute / ICMP issue after 24.7.1 update

Started by MeltdownSpectre, August 08, 2024, 07:16:38 PM

Previous topic - Next topic
Updated to 24.7.1 earlier today. All went well, except I can no longer run traceroutes from any Windows machines, on any VLAN.

Traceroutes from a Linux machine (my Raspberry Pi for example) work just fine, and traceroutes from the OPNsense Web GUI are working properly as well.

My ISP has routing / peering issues with some server providers sometimes, so I use WinMTR often to diagnose issues and report them so they can get resolved.

However, after the 24.7.1 update, it seems something funky is happening with ICMP and anything after the first hop gets dropped and I just see 'Request timed out'.

I haven't made added any new rules recently, and my existing firewall rules are exactly the same as they were before updating.

As I understand, Windows traceroutes use ICMP whereas on Linux they use UDP.

Any tips on how to go about diagnosing this or any insight on what changed with 24.7.1 that suddenly started causing this? It was fine on all previous versions, including 24.7_9.

Screenshots attached (Linux vs Windows).

https://imgur.com/a/yhDp4Jo


Interesting. Maybe this also explains this:

mtr -rzw opnsense.org                                                                                                                                            ─╯
Start: 2024-08-08T19:53:20+0200
HOST: mpp                                  Loss%   Snt   Last   Avg  Best  Wrst StDev
  1. AS???    opnsense                  0.0%    10    0.4   0.4   0.3   0.4   0.0
  2. AS???    ???                          100.0    10    0.0   0.0   0.0   0.0   0.0
  3. AS???    ???                          100.0    10    0.0   0.0   0.0   0.0   0.0
  4. AS???    ???                          100.0    10    0.0   0.0   0.0   0.0   0.0
  5. AS???    ???                          100.0    10    0.0   0.0   0.0   0.0   0.0
  6. AS???    ???                          100.0    10    0.0   0.0   0.0   0.0   0.0
  7. AS???    ???                          100.0    10    0.0   0.0   0.0   0.0   0.0
  8. AS???    ???                          100.0    10    0.0   0.0   0.0   0.0   0.0
  9. AS???    ???                          100.0    10    0.0   0.0   0.0   0.0   0.0
10. AS60781  178.162.131.118               0.0%    10   20.2  20.3  20.2  20.6   0.1


As soon as I switch to ICMP or TCP it's working again.

Quote from: Chaosphere64 on August 08, 2024, 07:55:02 PMAs soon as I switch to ICMP or TCP it's working again.

I believe mtr uses ICMP by default, and if I use

mtr dns.google

I get a result just like yours. However, if I use the -T or -u flags (for TCP or UDP) then the trace works normally.

I can confirm this. Probably, the default ICMP rules have changed... UDP or TCP work fine.
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 770 up, Bufferbloat A

My bet is on https://www.freebsd.org/security/advisories/FreeBSD-SA-24:05.pf.asc which pulled in hundreds of lines of changes in the pf ICMP handling code. I've seen it previously pass by on stable/14 and I wasn't planing to merge it right away, but the SA tipped the scale in favour of including it.

# opnsense-update -kr 24.7

If the old kernel works it's probably that.


Cheers,
Franco

You guys are right, I was under the impression that mtr on Unix/Linux would use UDP as a default like the traceroute implementation on these platforms does. Which is clearly not the case.

So, it's ICMP that's affected here.

I allow ICMP echo and IPV6-ICMP echo from anywhere to anywhere, because I do believe in troubleshooting tools and not quite as much in security by obscurity.

To me it looks like the previous version of pf treated an ICMP time exceeded in reply to an ICMP echo as part of the same state/connection and hence permitted the reply in. Now it doesn't. I wonder if this was intended.

Also simply permitting ICMP time exceeded in addition to echo does not help, because the NAT state is missing in case of IPv4.

Back to the drawing board, FreeBSD ;)
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Quote from: franco on August 08, 2024, 09:58:22 PM
My bet is on https://www.freebsd.org/security/advisories/FreeBSD-SA-24:05.pf.asc which pulled in hundreds of lines of changes in the pf ICMP handling code. I've seen it previously pass by on stable/14 and I wasn't planing to merge it right away, but the SA tipped the scale in favour of including it.

# opnsense-update -kr 24.7

If the old kernel works it's probably that.


Cheers,
Franco

With the old kernel it works again as it should.

Quote from: Patrick M. Hausen on August 08, 2024, 10:35:03 PM
Back to the drawing board, FreeBSD ;)

Indeed. "Crafted Echo Request packet after a Neighbor Solicitation (NS) can trigger an Echo Reply" - oh noes, shields up, Captain! We are doomed! ☠️😱

Best to revert this broken fix, IMO.

Same issue here after a two hops upgrade 24.1.10 -> 24.7 -> 24.7.1.

mtr is not working on Ubuntu Linux and macOS clients. The macOS traceroute is working though.

Can see the ICMP Echo Reply being blocked on WAN in firewall log for state violation. Opening all incoming ICMP on WAN does not help.

My first bet is also that it is related to the ICMPv6 security fix.
Bare Metal Lenovo ThinkCentre M720q i3-8100T 8GB
Intel I350-F4 Quad-Port Gigabit
Cable 600Mbps Down / 25 Mbps Up

So it's the new kernel? Anybody confirmed it? Might also be possible to confirm with pfctl -d / test traceroute / pfctl -e as a quick test that pf is doing it.

To be frank we're doomed when we ship security updates too late according to some.

And now we're doomed because we ship security issues in a timely manner because the same corner that said we don't ship them soon enough feeds suboptimal patches to FreeBSD.

Isn't it ironic...

Jokes aside this should probably be reported to https://bugs.freebsd.org but at this point I have no hopes somebody even cares giving the number of past and pending issues in that general direction.


Cheers,
Franco

Quote from: franco on August 09, 2024, 08:10:55 AM
So it's the new kernel? Anybody confirmed it? Might also be possible to confirm with pfctl -d / test traceroute / pfctl -e as a quick test that pf is doing it.

To be frank we're doomed when we ship security updates too late according to some.

And now we're doomed because we ship security issues in a timely manner because the same corner that said we don't ship them soon enough feeds suboptimal patches to FreeBSD.

Isn't it ironic...

Jokes aside this should probably be reported to https://bugs.freebsd.org but at this point I have no hopes somebody even cares giving the number of past and pending issues in that general direction.


Cheers,
Franco

The bug was not present on 24.7 and was definitely introduced with 24.7.1.

Any chance this particular change can be reverted?

> Any chance this particular change can be reverted?

No plans for today. You can revert the kernel as suggested. This needs attention in FreeBSD either way.


Cheers,
Franco

With pfctl -d, the problem is gone (at least with IPv6, as it also turns off NAT, I cannot test IPv4).
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 770 up, Bufferbloat A