I'm noticing huge throughput differences between development and production. I had shaper configured to improve bufferbloat on a 400mb cable pipe from Spectrum. Where I was previously getting ~350-350mb down/~20up I get ~50mb down/~20 up on dev. I also had major issues with a Zoom last night where the video was buffering and dropping.
On the waveform bufferbloat test, I was previously getting +7ms down/+0 up with my shaper config on prod, I'm getting ~+26ms down/ ~+7ms up, and the bandwidth takes a nosedive.
Update:
I've adjusted the pipes a bit and am seeing less latency, but they why isn't making much sense to me.
Dropped the bandwidth on the download pipe to 200mb (from 360). Removed any queue, FQ-CoDel limit, FQ-CoDel flow values. I'm getting a much more stable rating on the Waveform bufferbloat teast, but had to obviously give up a lot of bandwidth. It also seems like downloads are going waaaay slower.
After a little more testing, it looks like the major hit came from rss being enabled. I've disabled rss. Still doing some tuning to figure out how to get the performance back where it should be.
fq_codel works fine for me on a quad core machine with RSS enabled.
Imma spot for oddities, the only thing completely broken is fq_pie, it crashes my machine.
How did you configure the shaper?
Mine is shaping downstream from ~400 Mbit/s to 380 Mbit/s (real world: 365 Mbit/s) and upstream 210 Mbit/s to 195 Mbit/s (real world: 186 Mbit/s)
Here's what I currently have set up. If you see anything out of sort, please let me know. I've tried a variety of different values on the download pipe for the queues and FQ-CoDel values and haven't found an optimal configuration yet. I've also tried switching from newreno to HTCP to see if that will make any difference in tunables.
Pipes:
upload:
bandwidth 20mbps
no mask
scheduler FlowQueue-CoDel
(FQ-)CoDel ECN checked
download:
bandwidth 360 mbps
queues 2
no mask
scheduler FlowQueue-CoDel
FQ-CoDel quantum 1080
FQ-CoDel limit 1000
queues: * (FQ-)CoDel ECN checked on all queues
Upstream pipe 1 Upstream queue mask source
Upstream pipe 10 Upstream high priority mask source
Downstream pipe 1 Downstream queue mask destination
Downstream pipe 10 Downstream high priority queue mask destination
rules:
1 WAN udp <firewall and pihole IPs> any out Upstream high priority DNS High Priority
2 WAN tcp (ACK packets only) any any out Upstream high priority Upload ACK
3 WAN ipv4 any any Upstream queue out Upstream
4 WAN tcp (ACK packets only) any any in Downstream high priority queue Downstream high priority
5 WAN ipv4 any any Downstream queue in Downstream
tunables:
debug.pfftpproxy Disable the pf ftp proxy handler. unsupported unknown
dev.igb.0.eee_disabled unsupported 1
dev.igb.1.eee_disabled unsupported 1
hw.igb.num_queues unsupported 0
hw.igb.rx_process_limit unsupported -1
hw.igb.tx_process_limit unsupported -1
hw.syscons.kbd_reboot Disable CTRL+ALT+Delete reboot from keyboard. runtime default (0)
kern.ipc.maxsockbuf Maximum socket buffer size runtime default (4262144)
kern.randompid Randomize PID's (see src/sys/kern/kern_fork.c: sysctl_kern_randompid()) runtime default (1)
legal.intel_igb.license_ack unsupported 1
net.inet.icmp.drop_redirect Redirect attacks are the purposeful mass-issuing of ICMP type 5 packets. In a normal network, redirects to the end stations should not be required. This option enables the NIC to drop all inbound ICMP redirect packets without returning a response. runtime 1
net.inet.icmp.icmplim Set ICMP Limits runtime default (0)
net.inet.icmp.log_redirect This option turns off the logging of redirect packets because there is no limit and this could fill up your logs consuming your whole hard drive. runtime default (0)
net.inet.ip.accept_sourceroute Source routing is another way for an attacker to try to reach non-routable addresses behind your box. It can also be used to probe for information about your internal networks. These functions come enabled as part of the standard FreeBSD core system. runtime default (0)
net.inet.ip.fastforwarding IP Fastforwarding unsupported unknown
net.inet.ip.intr_queue_maxlen Maximum size of the IP input queue runtime default (1000)
net.inet.ip.portrange.first Set the ephemeral port range to be lower. runtime default (1024)
net.inet.ip.random_id Randomize the ID field in IP packets (default is 0: sequential IP IDs) runtime default (1)
net.inet.ip.redirect Enable sending IPv4 redirects runtime 0
net.inet.ip.sourceroute Source routing is another way for an attacker to try to reach non-routable addresses behind your box. It can also be used to probe for information about your internal networks. These functions come enabled as part of the standard FreeBSD core system. runtime default (0)
net.inet.tcp.blackhole Drop packets to closed TCP ports without returning a RST runtime default (2)
net.inet.tcp.cc.algorithm Default congestion control algorithm runtime htcp
net.inet.tcp.cc.htcp.adaptive_backoff unsupported 1
net.inet.tcp.cc.htcp.rtt_scaling unsupported 1
net.inet.tcp.delayed_ack Do not delay ACK to try and piggyback it onto a data packet runtime default (0)
net.inet.tcp.drop_synfin Drop SYN-FIN packets (breaks RFC1379, but nobody uses it anyway) runtime default (1)
net.inet.tcp.log_debug Enable TCP extended debugging runtime default (0)
net.inet.tcp.recvbuf_max Max size of automatic receive buffer runtime 4194304
net.inet.tcp.recvspace Maximum incoming/outgoing TCP datagram size (receive) runtime default (65228)
net.inet.tcp.sendbuf_max Max size of automatic send buffer runtime 4194304
net.inet.tcp.sendspace Maximum incoming/outgoing TCP datagram size (send) runtime default (65228)
net.inet.tcp.syncookies Generate SYN cookies for outbound SYN-ACK packets runtime default (1)
net.inet.tcp.tso TCP Offload Engine runtime default (1)
net.inet.udp.blackhole Do not send ICMP port unreachable messages for closed UDP ports runtime default (1)
This does not look particulary non standard to me, besides the custom values for some codel parameters.
Im running a somewhat similar setup.
Weird, maybe Franco or someone else can figure out whats going on here.
Can you monitor CPU Usage during the shaping?
Maybe the overhead increased for some reason and the appliance / host is not capable of dealing with this stress.
Quote from: bbin on November 13, 2021, 04:28:36 PM
FQ-CoDel quantum 1080
FQ-CoDel limit 1000
Try removing this - for me with, with the download pipe set at 930:
Download: 580.97 Mbps (data used: 942.1 MB
Upload: 44.11 Mbps (data used: 29.5 MB )
Removing the various codel limits:
Download: 894.78 Mbps (data used: 1.0 GB )
Upload: 44.34 Mbps (data used: 25.2 MB )
My cable is technically 1152Mbps to the modem, but I don't have the modem with a 2.5G ethernet port yet, so I'm aiming to avoid overloading the internal 1G interface rather than the modem/connection itself.
Also I think you may have some legacy settings, the below, I have some igb interfaces (just upgraded to a new box) and don't have these listed in sysctl -A?
hw.igb.rx_process_limit unsupported -1
hw.igb.tx_process_limit unsupported -1
I also have RSS enabled, FC disabled on all NICs, 8 core i7-9700 without HT.
P.S. I do also have a 1ms delay added to both pipes, as I believe in earlier versions I was seeing out of order packets (as net.inet.ip.dummynet.io_fast=0 had no effect) I haven't checked the source for FreeBSD 13 yet:
https://redmine.pfsense.org/issues/11192
So after a bit more playing, this seems to work for me - connection is 1152/52, but with a 1G interface on the inside of the modem I set my pipes to 930/50
- Remove all limits
- ECN on both down/up pipes
- Set Quantum to 4000 on the down pipe
- Set Quantum to 100 on the upload pipe
- Delay of 1ms on each pipe
Gets me back to A+ on the Waveform test, with a speed of around 900/45.
I was previously seeing similar results with limit/quatum at 1000 for the download...but seems not since the change to this release/FreeBSD13... now to find out why...
EDIT: Based on the below:
"Life gets dicy in 12x1 as quantums below 256 disable a few key things efq_codel does to get good (for example) VOIP performance. Stay at 300 or above."
I have upped the upload pipe quantum to 300, with no side effects to bufferbloat.
Interesting topic, I had problems dialing in a good compromise between throughput and bufferbloat a couple months ago where I was getting a "C". Then found this page below. It was a much less complicated setup than what I started with and got me a consistent A (not A+) with 200/10 service from spectrum. My settings are the same but my pipe bandwidth settings are 240/12.
https://maltechx.de/en/2021/03/opnsense-setup-traffic-shaping-and-reduce-bufferbloat
https://www.waveform.com/tools/bufferbloat?test-id=cbbcae07-f966-47ab-8708-7ccd4acacb53
Also, I noticed no difference between 21.7.7 and 22.1.r1; also have RSS enabled.
Interesting, in the link they ended up with more or less what I did. :)
However, whilst I might be wrong, I was under the assumption that it reserved some bandwidth for new connections/sessions?
So maybe with more concurrent bandwidth demands from devices, you might find the connection overloaded (and bufferbloat coming back again) if you're close to the upper end of the available bandwidth?
With the maximum throughput of gigabit ethernet at around 940Mbps, I'd perhaps question the 1000Mbits set in the link - unless the cable modem has a 2.5Gb port on the inside.... which is what I'm waiting for :)
QuoteWith the maximum throughput of gigabit ethernet at around 940Mbps, I'd perhaps question the 1000Mbits set in the link
Not sure, but his results at the bottom of that page show he's pulling only 751mb/s (and only 837 prior) with a 1000 setting (bottleneck somewhere?). Tweaking that setting here, I've tried to dial it in so I can maximize my bandwidth for a single host, while presumably allowing smaller requests to get through quickly. But you have a good point, I'll try a couple hosts running the BB test concurrently, curious to see those results.
Might be worthwhile experimenting with the 1ms delay on the pipes - to prevent out of order packets - I found it helped. Unless you're one of these gamers that goes nuts over 1ms ;)
It seems the option to disable IO Fast still doesn't do anything, as it is commented out in line 946 in the code (same as earlier versions of FreeBSD)
https://github.com/opnsense/src/blob/stable/13/sys/netpfil/ipfw/ip_dn_io.c
As the chap in the pfsense post mentions originally:
"Since net.inet.ip.dummynet.io_fast does split path of packets for saturated/unsaturated pipe mode, then this setting is likely to be responsible for packet reordering. (traffic is very bursty for TCP without pacing or IPERF3 UDP test, so saturation/desaturation of pipe occurs several times in one second, so it seems then we get reorders on every transition)
But setting of net.inet.ip.dummynet.io_fast=0 has no effect, net.inet.ip.dummynet.io_pkt_fast is still increasing. Explanation is very simple:
io_fast check is commented in dummynet source code:
if (/*dn_cfg.io_fast &&*/ m *m0 && (dir & PROTO_LAYER2) 0 ) {"
Thanks @iMx, added the 1ms and will do further testing later. Wonder if RSS solves the out-of-order issue or is the order problem not occurring via core assignment.
My understanding is that with net.inet.ip.dummynet.io_fast enabled, as it is by default, that when the pipe bandwidth is not exhausted packets are forwarded directly without actually going through the shaper.
The problem arises when you're at capacity/saturation, some packets will be forwarded 'fast', some won't and that:
Quote'saturation/desaturation of pipe occurs several times in one second so it seems then we get reorders on every transition)'
Ideally, we could turn off io_fast to always force all packets through the shaper, but this option is commented in the code. So the only way to work around it, is to force all packets to always go through the shaper - adding a 1ms delay to all pipes makes that happen.
That makes sense, thanks for the explanation.
I ended up getting the best performance by making sure my pipes were limited just below what my ISP is supposed to offer for speed performance in combination with the blow settings. My ISP is Verizon FIOS and I can get at best 940/880, so my limits are set to 900/850
Download + Upload pipe settings (not mentioned settings are blank)
Scheduler type: FlowQueue-CoDel
Yup, just the scheduler change did it for me on OPNsense 22.1.7_1-amd64