Traceroute / ICMP issue after 24.7.1 update

Started by MeltdownSpectre, August 08, 2024, 07:16:38 PM

Previous topic - Next topic
Yes, Wednesday is a safe bet.


Cheers,
Franco

Running ICMP2 kernel for 22 hours now and not seeing any reconnect from the Chromecast!
I think my Chromecast is the problem.

It could be the patches. I'm pretty sure they don't cover all edge cases yet. But as noted it's not good to revert to the old state because I don't think we'll see fixes sooner then if we're the only ones noticing.


Cheers,
Franco

Just installed 24.7.2

Traceroutes / ICMP behaviour seems back to normal (for IPv4 at least). I don't use IPv6 so can't test that.

Huge thanks to Franco for getting an update out so quick to fix it and to doktornotor and the others for testing and submitting the bug report to FreeBSD.

I would also like to thank everyone who helped test this! :)

I think this isn't over yet, but at least we are one step further:

https://forum.opnsense.org/index.php?topic=42270.0
https://github.com/opnsense/src/issues/217


Cheers,
Franco

updated to 24.7.2, but i'm still losing some random packets on ipv6 ping (going out, only a jump after my firewall)

--- ipv6.abc.xyz ping statistics ---
1566 packets transmitted, 1547 received, 1.21328% packet loss, time 1567560ms
rtt min/avg/max/mdev = 0.185/2.232/1482.222/43.446 ms, pipe 2


I was more stable on "24.7.1-pf4" (no packet loss)


It seems like shipping the FreeBSD fixes was not the right decicion after all as there are still problems with neighbor discovery and upstream does not care all that much:

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=280701
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 770 up, Bufferbloat A

Not surprising, but it is what it has been for a while now. The bug was closed mainly to wrap up the revised SA and ship it in 14.1-RELEASE-p4. These types of issues often have a long round trip time WRT to the remaining issues.

Shipping the initial SA in all supported versions of FreeBSD with the scope of hundreds of lines of code changed was a release engineering mistake. This is clear and simple.

That being said I appreciate that someone actually went ahead and fixed the main mistakes with it without making an immediate scene about it.


Cheers,
Franco

Ok, if someone wants to file a new bug with FreeBSD for the remaining regressions caused by patching a security non-issue, feel free. I've had it with upstream for some time, it may be harmful to mental health apparently.

Could not resist commenting on the whole SA - https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=280701#c40

@Uwe I'm sure we can test on FreeBSD, but they need to give a direction of what they actually want instead of saying that some FreeBSD commit actually is solely a downstream issue... It actually doesn't matter if we test on FreeBSD or OPNsense kernel because we talk about the same code change. This is mind boggling to me. :)

@doktornotor appreciate what you did there from the start

The exercise in patience and restraint that this one of many sagas is commendable.

That discussion on the FreeBSD list is wild. People report issues in good faith, and show that reverting the offending code fixes things. At the same time, FreeBSD maintainers are deaf and point to others and just close the topic  :o
Probably the easiest is to just not ship kernels with the offending "fixes" and when pfsense hits the same issue, people will probably believe it  :-X :o

Quote from: franco on August 23, 2024, 11:32:50 AM
It actually doesn't matter if we test on FreeBSD or OPNsense kernel because we talk about the same code change. This is mind boggling to me. :)

Yeah, it doesn't matter - except that the obvious fact that guy who committed all of this code without any test coverage causing the regressions here uses the FreeBSD code on the "other project" and does not care about breaking stable FreeBSD releases at all, since that other project happens to run on -CURRENT.

For future interaction with upstream - how much of a trouble / overhead would shipping a matching vanilla FreeBSD kernel for the regression debugging purposes be? I mean, identical config, just no patches. With the boot environments (snapshots) in place, shouldn't be much of an issue even if it fails to boot altogether.

August 23, 2024, 12:06:17 PM #119 Last Edit: August 23, 2024, 12:12:56 PM by meyergru
I know that it doesn't matter at all (and I indicated it in the bug report), all that would do is to provide proof that it is not a downstream issue, so they are forced to come out of their corner and admit that their fix cannot be accepted as final.

When you think about it, you can:

1. Prove that it is an upstream issue and hope for an upstream fix.
2. Wait for the "other project" to stumble over this as well.

If you opt for 1, 24.7.3 could potentially fix the problem going forward and be done with it. If you go with 2, you should revert the patches for OpnSense if you want an intermediate fix. In the latter case, you would have to touch it again if/after a real upstream fix. On the other hand, choosing option 1 would help the "other project" - maybe without them even ever knowing.

P.S.: This is only a discussion of how to proceed on OpnSense's behalf, as clearly I also do not like how upstream shrugs this off as an "OPP" (other people's problem, or in german: "PAL" (Problem anderer Leute)).  8)
Intel N100, 4 x I226-V, 16 GByte, 256 GByte NVME, ZTE F6005

1100 down / 770 up, Bufferbloat A