IPv6 Control Plane with FQ_CoDel Shaping

Started by OPNenthu, April 26, 2025, 12:48:44 PM

Previous topic - Next topic
May 02, 2025, 12:05:08 PM #45 Last Edit: May 02, 2025, 12:08:19 PM by meyergru
Unless there is a less intrusive way of fixing this than a reboot, it should be pointed out as a caveat in the instructions. Would a fw state reset help?
Matter of fact, for me, this was unexpected and I still can neither reliably reproduce it nor are the effects consistent.
Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, ZTE F6005

1100 down / 450 up, Bufferbloat A+

Quote from: meyergru on May 02, 2025, 12:05:08 PMUnless there is a less intrusive way of fixing this than a reboot, it should be pointed out as a caveat in the instructions.

I agree, but thinking about it, into which section of the shaper docs to point it out? This is not specific only to the examples, but to the Shaper as whole. I think if this is the case it should be under the main Shaper section.

Quote from: meyergru on May 02, 2025, 12:05:08 PMWould a fw state reset help?
Would be worth a try.

@All
If somebody hits this problem can that person try to reset the fw states and let us know?

Quote from: meyergru on May 02, 2025, 12:05:08 PMMatter of fact, for me, this was unexpected and I still can neither reliably reproduce it nor are the effects consistent.
Its interesting this is happening at all, from the description of the problem one would assume that the problem could be due to packets not being classified properly, but in that case no BW reduction would be visible if the shaper is bypassed.


Regards,
S.
Networking is love. You may hate it, but in the end, you always come back to it.

OPNSense HW
N355 - i226-V | AQC113C | 16G | 500G - PROD

PRXMX
N5105 - i226-V | 2x8G | 512G - NODE #1
N100 - i226-V | 16G | 1T - NODE #2

May 03, 2025, 12:03:46 AM #47 Last Edit: May 03, 2025, 12:07:09 AM by OPNenthu
Quote from: meyergru on May 02, 2025, 12:05:08 PMWould a fw state reset help?

Probably not, IMO.  I tested with an OPNsense VM in a double-NAT setup (IPv4-only), so not exactly the same situation, but I did reproduce the issue.

I configured the control & data plane pipes, queues, and rules.  I set the Download pipe to 545Mbit/s and the Upload to 34Mbit/s accounting for the VM/NAT overhead.

After applying the changes I observed a false start in the Bufferbloat test (hung on "Warming Up..."), followed by a semi-successful test (reduced performance on the Download), followed by a second false start.  See "semi_successful.png".

I then reset the F/W states from the Diagnostics menu and gave it a minute to re-establish and settle.  The next couple of Bufferbloat tests did not stall, but the Download performance was still subpar.  This was reproducible.  See "after_reset.png".

Finally I rebooted the VM and then only observed the full performance:  Result
(Sorry, ran out of image quota on this post, so had to crop the second image and could not upload the final one).

N5105 | 8/250GB | 4xi226-V | Community

https://www.youtube.com/watch?v=XI9NG068TwI

May 03, 2025, 02:00:17 AM #48 Last Edit: May 03, 2025, 02:01:55 AM by Seimus
Alright looks like following observations can be made,

A. There really is a glitch or BUG, when configuring or Changing the Shaper
B. Issue is causing degraded performance e.g lower than expected Throughput and/or application stall(during congestion)
C. This is somewhat reproducible
D. Affects any traffic matched by the Shaper Rules
E. Clearing States in pf doesn't fix the problem
F. FW reboot does fix the problem

So there is either something wrong with OPNsense pushing the config into ipfw/dummynet or ipfw/dummynet itself.

Regards,
S.
Networking is love. You may hate it, but in the end, you always come back to it.

OPNSense HW
N355 - i226-V | AQC113C | 16G | 500G - PROD

PRXMX
N5105 - i226-V | 2x8G | 512G - NODE #1
N100 - i226-V | 16G | 1T - NODE #2

May 09, 2025, 10:10:20 PM #49 Last Edit: May 13, 2025, 04:40:09 PM by vik
This bug is impacting my setup, opened git issue:

https://github.com/opnsense/core/issues/8649

Quote from: vik on May 09, 2025, 10:10:20 PMThis bug is impacting my setup, opened git issue:

https://github.com/opnsense/core/issues/8649

Many thanks for reporting it officially and driving it with the devs. I totally missed this one due to other activities. Looks like issues was found and fixed!

Regards,
S.
Networking is love. You may hate it, but in the end, you always come back to it.

OPNSense HW
N355 - i226-V | AQC113C | 16G | 500G - PROD

PRXMX
N5105 - i226-V | 2x8G | 512G - NODE #1
N100 - i226-V | 16G | 1T - NODE #2

The docs are officially published

https://docs.opnsense.org/manual/how-tos/shaper_control_plane.html

Many thanks everyone specifically OPNenthu & meyergru for the colab and testing you provided ;)

Regards,
S.
Networking is love. You may hate it, but in the end, you always come back to it.

OPNSense HW
N355 - i226-V | AQC113C | 16G | 500G - PROD

PRXMX
N5105 - i226-V | 2x8G | 512G - NODE #1
N100 - i226-V | 16G | 1T - NODE #2

July 30, 2025, 10:57:37 PM #52 Last Edit: July 31, 2025, 06:34:05 PM by OPNenthu
Quote from: Seimus on July 24, 2025, 10:49:23 AMThe docs are officially published [...]

Cheers @Seimus!  This does raise the bar a bit on networking concepts. 

I've been using the setup from this thread for a few months now without any issues, and since the fix went in for the issue that @vik raised in GitHub, it's been solid.

---

On a separate note, the issue that @meyergru and I saw with the Waveform Bufferbloat test could be something on their end as well.  I'm now seeing that the test is failing to start (stays stuck), but it was working just last night with no changes on my end.  All of the other test sites, such as Cloudflare speed test and speedtest.net, are working fine and with no slowdowns observed.  So just a word of warning that they might have an app issue.  Let's see if anyone else observes the same.

EDIT:  today (Jul. 31) that Bufferbloat test is working again ¯\_(ツ)_/¯
N5105 | 8/250GB | 4xi226-V | Community

https://www.youtube.com/watch?v=XI9NG068TwI

Necro bump-

Are you guys seeing regressions in 26.1.x?  The upload portion of my speed tests has started stalling a lot to where the tests never start (Waveform Bufferbloat) or finish (Cloudflare speedtest).  In the case of the Bufferbloat test it stays stuck on "Warming up" for that portion of the test.

There were some ISP changes in my area recently as they upgraded their infrastructure.  I noticed that my latencies increased a little bit, and I need to redo the pipe widths.  But, I don't know if this had anything to do with the shaping instability.

I also found some posts online where others notice this behavior only with Firefox (?).  I don't have any Chrome based browsers at the moment to try but maybe I should install one and compare.
N5105 | 8/250GB | 4xi226-V | Community

https://www.youtube.com/watch?v=XI9NG068TwI

I reset the shaper a few times while tweaking the pipe b/w, so it's working at the moment.  Also changed out the Ethernet cable, just in case.  Will keep an eye on it.

I did install Chromium under Linux and compared to Firefox, it seems to give slightly better results in the tests.  The browser engine appears to affect the performance.

Also found this awesome test: https://bufferbloat.libreqos.com/

Wanted to share that link as it seems to give a lot more detailed metrics than the Waveform test and has additional tests (e.g. "Household" sim) and ISP rankings.
N5105 | 8/250GB | 4xi226-V | Community

https://www.youtube.com/watch?v=XI9NG068TwI

Interesting tool, although I get a B for video calls each time and I do not understand why.

P.S.: Waveform still works for me.
Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, ZTE F6005

1100 down / 450 up, Bufferbloat A+

Quote from: OPNenthu on Today at 01:07:54 AMAre you guys seeing regressions in 26.1.x?  The upload portion of my speed tests has started stalling a lot to where the tests never start (Waveform Bufferbloat) or finish (Cloudflare speedtest).  In the case of the Bufferbloat test it stays stuck on "Warming up" for that portion of the test.

I have seen this behavior (Waveform Bufferbloat test stays stuck on "Warming up") ever so often, I suspect intermittent bandwidth or server load issues on their part.
In theory there is no difference between theory and practice. In practice there is.

I saw that in the past when I over-optimized the shaper by not leaving enough headroom in the pipes' bandwidths in pursuit of maximum bandwidth. You cannot have your cake and still eat it.
Intel N100, 4* I226-V, 2* 82559, 16 GByte, 500 GByte NVME, ZTE F6005

1100 down / 450 up, Bufferbloat A+

Today at 11:24:41 AM #58 Last Edit: Today at 11:26:23 AM by OPNenthu
I knocked down my upload pipe width a little bit, though I don't have a lot to play with.  My ISP plan is very asymmetric: 1000Mbps down / 40Mbps up.

Now I'm getting an A in Waveform, an A in LibreQoS, but an F in the Household test (was previously getting a B there).  I'm confused by that Household test now as well.  It's not showing any packet loss.
N5105 | 8/250GB | 4xi226-V | Community

https://www.youtube.com/watch?v=XI9NG068TwI

This test tool is very nice it provides a lot of statistic and tests a variation of traffic types & patterns.

Its created by the people that are involved in the bufferbloat community, basically people responsible for RFC of CoDel, CAKE and the latest iteration of CAKE for ISP LibreQoS.

For those who didnt know, Dave Täht one of the original creators of CoDel & CAKE (AQMs) sadly passed away in 2025. LibreQoS and the bufferbloat initiative is his legacy

In loving memory of Dave Täht
Networking is love. You may hate it, but in the end, you always come back to it.

OPNSense HW
N355 - i226-V | AQC113C | 16G | 500G - PROD

PRXMX
N5105 - i226-V | 2x8G | 512G - NODE #1
N100 - i226-V | 16G | 1T - NODE #2