Hello,
I have discovered a strange problem using the Weighted Fair Queueing scheduler. The greater the bandwidth of the pipe, the less the weight matters. This behaviour starts about at a pipe bandwidth of 50 Mbit/s.
The following measurements show the problem.
Pipe Bandwidth | Host1 Bandwidth (Weight 100) | Host2 Bandwidth (Weight 1) |
10 Mbit/s | 9,56 Mbit/s | 100 Kbit/s |
20 Mbit/s | 19,1 Mbit/s | 185 Kbit/s |
30 Mbit/s | 28,7 Mbit/s | 290 Kbit/s |
40 Mbit/s | 38,3 Mbit/s | 380 Kbit/s |
50 Mbit/s | 47,7 Mbit/s | 740 Kbit/s |
60 Mbit/s | 55,3 Mbit/s | 2,46 Mbit/s |
70 Mbit/s | 55,8 Mbit/s | 12,1 Mbit/s |
80 Mbit/s | 55,9 Mbit/s | 22,2 Mbit/s |
90 Mbit/s | 55,8 Mbit/s | 31,5 Mbit/s |
100 Mbit/s | 55,8 Mbit/s | 41,1 Mbit/s |
The Bandwidth tests were performed with iperf3. The test setup consists of one pipe, two queues and two roles.
Bandwidth: 100 (adjusted for each test as seen above)
Metric: Mbit/s
Mask: none
Description: PipeDown
Pipe: PipeDown
Weight: 100
Mask: none
Description: QueueDownHigh
Pipe: PipeDown
Weight: 1
Mask: none
Description: QueueDownLow
Rule "ShapeDownHigh" assigns traffic from Host1 to the queue "QueueDownHigh".
Rule "ShapeDownLow" assigns traffic from Host2 to the queue "QueueDownLow".
Does anyone have an idea what the problem might be? Does a similar setup work for anyone?
Inzeresting, I try to reproduce.
What happens when you switch to QFQ instead WFQ
Thank you for your response! But unfortunately no difference.
Any chance to test with 20.1?
Are you sure host1 is able to saturate the pipe? What happens when you remove the second queue? Does host1 deliver 100MBits?
@hkp Sure. Then I get around 870 Mbit/s.
@mimugmail Yes. I have just downloaded OPNsense 20.1 and I will set up the test environment shortly.
Unfortunately exactly the same problem exists in OPNsense 20.1.
Edit: And also in OPNsense 19.1.4 :-\
This is good, not another problem due to 20.7 .. I'll test next week
Are there any news here? Can anyone try to reproduce this problem?
push ;)
I think it's a bug in ipfw/dummynet. The same behaviour appears in the Limiter of pfSense.
Quote from: nuwe70 on April 26, 2021, 09:31:14 PM
I think it's a bug in ipfw/dummynet. The same behaviour appears in the Limiter of pfSense.
Do you have a link for this?
Quote from: mimugmail on April 27, 2021, 10:19:49 PM
Do you have a link for this?
I read that the framework behind the pfSense Limiter is what OPNsense uses as traffic shaper technology. So I have set up a new pfSense VM and tested it myself. The result is the same as described in the first post. Bandwidth without Limiter enabled 2.30 Gbits/sec.
The config screenshots are attached. Additionally, two rules that redirect traffic from two different hosts to queueLow and queueHigh.
Do you know if this is tracked somewhere at pfsense (forum or redmine).
No, unfortunately I have not found anything there. :-\
Maybe we nee queue buffers! Check this article:
https://klarasystems.com/articles/dummynet-the-better-way-of-building-freebsd-networks/
I have read the article and tested with higher queue buffer. There is no difference.
If I understand it correctly, the queue buffer in the article was necessary because a single host was not able to max out the bandwidth of the pipe. This is not a problem for me.
If two hosts using the same pipe but with different queues and weights, the weight will be ignored more and more if the pipe bandwidth is higher than about 50 MBit/s. For example with a pipe bandwith of 100 MBit/s, both hosts using the same bandwidth even though one host has a weight of 1 and the other host a weight of 100. The weights are correct for a lower pipe bandwidth of, for example, 10 Mbit/s.
Hm, seems I need to rebuild my lab to check this.
That would be great, thanks in advance.
Ok, this is my lab now. Can you please as detailed as possible describe what should be tested with IP's/names from the diagram:
+--------+ +--------+
| FW-A | 10.255.255.0 | FW-B |
| |-------------------------| |
+--------+ +--------+
| | | |
| | | |
| | | |
192.168.10.0 | | | | 192.168.11.0
| | | |
| | | |
| | | |
| | |
+---------+ +----------+ +-----------+ +-----------+
| | | | | | | |
| Deb-A1 | |Deb-A2 | | Deb-B1 | | Deb-B2 |
+---------+ +----------+ +-----------+ +-----------+
.201 .202 .201 .202
Set up FW-A as follows.
Firewall -> Shaper -> Pipes -> New Pipe
Bandwidth: 10 Mbit/s
Description: Pipe
Firewall -> Shaper -> Queues -> New Queue
Pipe: Pipe
Weight: 100
Description: QueueHigh
Firewall -> Shaper -> Queues -> New Queue
Pipe: Pipe
Weight: 1
Description: QueueLow
Firewall -> Shaper -> Rules -> New Rule
Sequence: 1
Interface: WAN
Source: 192.168.10.201
Destination: any
Target: QueueLow
Description: RuleLow
Firewall -> Shaper -> Rules -> New Rule
Sequence: 2
Interface: WAN
Source: 192.168.10.202
Destination: any
Target: QueueHigh
Description: RuleHigh
Set up iperf3 server
Run two iperf3 servers on two different ports on the WAN side, for example on 10.255.255.10 with
iperf3 -s -p 5000
iperf3 -s -p 5001
Test 1
Check if a single host (for example 192.168.10.201) is limited to 10 Mbit/s with
iperf3 -c 10.255.255.10 -p 5000
If so, run iperf clients on both hosts in parallel:
192.168.10.201:
iperf3 -c 10.255.255.10 -p 5000 -t 60
192.168.10.202:
iperf3 -c 10.255.255.10 -p 5001 -t 60
Check if 192.168.10.201 if using about 1% of the bandwidth and 192.168.10.202 about 99% of the bandwidth.
Test 2
Edit Pipe to
Bandwidth: 100 Mbit/s
Do parallel iperf test again and now 192.168.10.201 and 192.168.10.202 using almost the same bandwidth. But this is not expected!
The lab is/was on a VPS hoster .. the values were too rotative. I need to set one up with real hardware.
Maybe on Friday when I'm in office
Thanks for your help! Let me know if I can help in any way.
Have you had time to reproduce this behavior on real hardware?
No, currently I have a lab with two OPN directly connected and in each side one client. Should this also bei possible with multiple streams?
Yes, I think so. Then you have to change the rules to match the iperf server ports instead of the entire host.
Edit:
I just tested it. Based on post #20 you have to change the following.
Firewall -> Shaper -> Rules -> New Rule
Sequence: 1
Interface: WAN
Source: 192.168.10.201
Destination: any
Dst-port: 5000
Target: QueueLow
Description: RuleLow
Firewall -> Shaper -> Rules -> New Rule
Sequence: 2
Interface: WAN
Source: 192.168.10.201
Destination: any
Dst-port: 5001
Target: QueueHigh
Description: RuleHigh
And run the following command twice on host 192.168.10.201.
iperf3 -c 10.255.255.10 -p 5000 -t 60
Holy shit! :D For more than a year I have this problem. Now I figured out what causes this behavior.
I always used virtual machines to run OPNsense. The problem only exists in virtual machines, not on real hardware. The reason is that the OS is using a different value for the kernel parameter "kern.hz" in virtual environments. This parameter sets the kernel interval timer rate and affects for example dummynet or ZFS. The default value in a VM is 100 but on real hardware 1000.
So the solution is to set the kernel parameter to a higher value, for example 1000. The higher the bandwidth, the higher the value must be.
Go to System -> Settings -> Tunables and add a new entry
Tunable: kern.hz
Description: Set the kernel interval timer rate
Value: 1000
More information here: https://groups.google.com/g/mailing.freebsd.ipfw/c/oVbFsI3JqfM