Print Page - Throughput performance issues/shaper problems

Title: Throughput performance issues/shaper problems
Post by: bbin on November 12, 2021, 05:56:13 PM

I'm noticing huge throughput differences between development and production. I had shaper configured to improve bufferbloat on a 400mb cable pipe from Spectrum. Where I was previously getting ~350-350mb down/~20up I get ~50mb down/~20 up on dev. I also had major issues with a Zoom last night where the video was buffering and dropping.

On the waveform bufferbloat test, I was previously getting +7ms down/+0 up with my shaper config on prod, I'm getting ~+26ms down/ ~+7ms up, and the bandwidth takes a nosedive.

Title: Re: Throughput performance issues/shaper problems
Post by: bbin on November 12, 2021, 06:14:45 PM

Update:

I've adjusted the pipes a bit and am seeing less latency, but they why isn't making much sense to me.

Dropped the bandwidth on the download pipe to 200mb (from 360). Removed any queue, FQ-CoDel limit, FQ-CoDel flow values. I'm getting a much more stable rating on the Waveform bufferbloat teast, but had to obviously give up a lot of bandwidth. It also seems like downloads are going waaaay slower.

Title: Re: Throughput performance issues/shaper problems
Post by: bbin on November 12, 2021, 07:14:10 PM

After a little more testing, it looks like the major hit came from rss being enabled. I've disabled rss. Still doing some tuning to figure out how to get the performance back where it should be.

Title: Re: Throughput performance issues/shaper problems
Post by: MartB on November 13, 2021, 02:30:21 AM

fq_codel works fine for me on a quad core machine with RSS enabled.
Imma spot for oddities, the only thing completely broken is fq_pie, it crashes my machine.

How did you configure the shaper?

Mine is shaping downstream from ~400 Mbit/s to 380 Mbit/s (real world: 365 Mbit/s) and upstream 210 Mbit/s to 195 Mbit/s (real world: 186 Mbit/s)

Title: Re: Throughput performance issues/shaper problems
Post by: bbin on November 13, 2021, 04:28:36 PM

Here's what I currently have set up. If you see anything out of sort, please let me know. I've tried a variety of different values on the download pipe for the queues and FQ-CoDel values and haven't found an optimal configuration yet. I've also tried switching from newreno to HTCP to see if that will make any difference in tunables.

Pipes:
upload:
bandwidth 20mbps
no mask
scheduler FlowQueue-CoDel
(FQ-)CoDel ECN checked

download:
bandwidth 360 mbps
queues 2
no mask
scheduler FlowQueue-CoDel
FQ-CoDel quantum 1080
FQ-CoDel limit 1000

queues: * (FQ-)CoDel ECN checked on all queues
Upstream pipe   1   Upstream queue   mask source
Upstream pipe   10   Upstream high priority    mask source
Downstream pipe   1   Downstream queue    mask destination
Downstream pipe   10   Downstream high priority queue    mask destination

rules:
1   WAN   udp   <firewall and pihole IPs>   any   out   Upstream high priority   DNS High Priority
2   WAN   tcp (ACK packets only)   any   any   out   Upstream high priority   Upload ACK
3   WAN   ipv4   any   any   Upstream queue   out   Upstream
4   WAN   tcp (ACK packets only)   any   any   in   Downstream high priority queue   Downstream high priority
5   WAN   ipv4   any   any   Downstream queue   in   Downstream

tunables:
debug.pfftpproxy   Disable the pf ftp proxy handler.   unsupported   unknown
dev.igb.0.eee_disabled      unsupported   1
dev.igb.1.eee_disabled      unsupported   1
hw.igb.num_queues      unsupported   0
hw.igb.rx_process_limit      unsupported   -1
hw.igb.tx_process_limit      unsupported   -1
hw.syscons.kbd_reboot   Disable CTRL+ALT+Delete reboot from keyboard.   runtime   default (0)
kern.ipc.maxsockbuf   Maximum socket buffer size   runtime   default (4262144)
kern.randompid   Randomize PID's (see src/sys/kern/kern_fork.c: sysctl_kern_randompid())   runtime   default (1)
legal.intel_igb.license_ack      unsupported   1
net.inet.icmp.drop_redirect   Redirect attacks are the purposeful mass-issuing of ICMP type 5 packets. In a normal network, redirects to the end stations should not be required. This option enables the NIC to drop all inbound ICMP redirect packets without returning a response.   runtime   1
net.inet.icmp.icmplim   Set ICMP Limits   runtime   default (0)
net.inet.icmp.log_redirect   This option turns off the logging of redirect packets because there is no limit and this could fill up your logs consuming your whole hard drive.   runtime   default (0)
net.inet.ip.accept_sourceroute   Source routing is another way for an attacker to try to reach non-routable addresses behind your box. It can also be used to probe for information about your internal networks. These functions come enabled as part of the standard FreeBSD core system.   runtime   default (0)
net.inet.ip.fastforwarding   IP Fastforwarding   unsupported   unknown
net.inet.ip.intr_queue_maxlen   Maximum size of the IP input queue   runtime   default (1000)
net.inet.ip.portrange.first   Set the ephemeral port range to be lower.   runtime   default (1024)
net.inet.ip.random_id   Randomize the ID field in IP packets (default is 0: sequential IP IDs)   runtime   default (1)
net.inet.ip.redirect   Enable sending IPv4 redirects   runtime   0
net.inet.ip.sourceroute   Source routing is another way for an attacker to try to reach non-routable addresses behind your box. It can also be used to probe for information about your internal networks. These functions come enabled as part of the standard FreeBSD core system.   runtime   default (0)
net.inet.tcp.blackhole   Drop packets to closed TCP ports without returning a RST   runtime   default (2)
net.inet.tcp.cc.algorithm   Default congestion control algorithm   runtime   htcp
net.inet.tcp.cc.htcp.adaptive_backoff      unsupported   1
net.inet.tcp.cc.htcp.rtt_scaling      unsupported   1
net.inet.tcp.delayed_ack   Do not delay ACK to try and piggyback it onto a data packet   runtime   default (0)
net.inet.tcp.drop_synfin   Drop SYN-FIN packets (breaks RFC1379, but nobody uses it anyway)   runtime   default (1)
net.inet.tcp.log_debug   Enable TCP extended debugging   runtime   default (0)
net.inet.tcp.recvbuf_max   Max size of automatic receive buffer   runtime   4194304
net.inet.tcp.recvspace   Maximum incoming/outgoing TCP datagram size (receive)   runtime   default (65228)
net.inet.tcp.sendbuf_max   Max size of automatic send buffer   runtime   4194304
net.inet.tcp.sendspace   Maximum incoming/outgoing TCP datagram size (send)   runtime   default (65228)
net.inet.tcp.syncookies   Generate SYN cookies for outbound SYN-ACK packets   runtime   default (1)
net.inet.tcp.tso   TCP Offload Engine   runtime   default (1)
net.inet.udp.blackhole   Do not send ICMP port unreachable messages for closed UDP ports   runtime   default (1)

Title: Re: Throughput performance issues/shaper problems
Post by: MartB on November 13, 2021, 09:00:45 PM

This does not look particulary non standard to me, besides the custom values for some codel parameters.
Im running a somewhat similar setup.

Weird, maybe Franco or someone else can figure out whats going on here.
Can you monitor CPU Usage during the shaping?

Maybe the overhead increased for some reason and the appliance / host is not capable of dealing with this stress.

Title: Re: Throughput performance issues/shaper problems
Post by: iMx on January 20, 2022, 02:20:10 PM

Quote from: bbin on November 13, 2021, 04:28:36 PM

FQ-CoDel quantum 1080
FQ-CoDel limit 1000

Try removing this - for me with, with the download pipe set at 930:

Download: 580.97 Mbps (data used: 942.1 MB
Upload: 44.11 Mbps (data used: 29.5 MB )

Removing the various codel limits:

Download: 894.78 Mbps (data used: 1.0 GB )
Upload: 44.34 Mbps (data used: 25.2 MB )

My cable is technically 1152Mbps to the modem, but I don't have the modem with a 2.5G ethernet port yet, so I'm aiming to avoid overloading the internal 1G interface rather than the modem/connection itself.

Also I think you may have some legacy settings, the below, I have some igb interfaces (just upgraded to a new box) and don't have these listed in sysctl -A?

hw.igb.rx_process_limit unsupported -1
hw.igb.tx_process_limit unsupported -1

I also have RSS enabled, FC disabled on all NICs, 8 core i7-9700 without HT.

P.S. I do also have a 1ms delay added to both pipes, as I believe in earlier versions I was seeing out of order packets (as net.inet.ip.dummynet.io_fast=0 had no effect) I haven't checked the source for FreeBSD 13 yet:

https://redmine.pfsense.org/issues/11192

Title: Re: Throughput performance issues/shaper problems
Post by: iMx on January 20, 2022, 03:19:03 PM

So after a bit more playing, this seems to work for me - connection is 1152/52, but with a 1G interface on the inside of the modem I set my pipes to 930/50

- Remove all limits
- ECN on both down/up pipes
- Set Quantum to 4000 on the down pipe
- Set Quantum to 100 on the upload pipe
- Delay of 1ms on each pipe

Gets me back to A+ on the Waveform test, with a speed of around 900/45.

I was previously seeing similar results with limit/quatum at 1000 for the download...but seems not since the change to this release/FreeBSD13... now to find out why...

EDIT: Based on the below:

"Life gets dicy in 12x1 as quantums below 256 disable a few key things efq_codel does to get good (for example) VOIP performance. Stay at 300 or above."

I have upped the upload pipe quantum to 300, with no side effects to bufferbloat.

Title: Re: Throughput performance issues/shaper problems
Post by: gpb on January 20, 2022, 04:58:37 PM

Interesting topic, I had problems dialing in a good compromise between throughput and bufferbloat a couple months ago where I was getting a "C". Then found this page below. It was a much less complicated setup than what I started with and got me a consistent A (not A+) with 200/10 service from spectrum. My settings are the same but my pipe bandwidth settings are 240/12.

https://maltechx.de/en/2021/03/opnsense-setup-traffic-shaping-and-reduce-bufferbloat

https://www.waveform.com/tools/bufferbloat?test-id=cbbcae07-f966-47ab-8708-7ccd4acacb53

Also, I noticed no difference between 21.7.7 and 22.1.r1; also have RSS enabled.

Title: Re: Throughput performance issues/shaper problems
Post by: iMx on January 20, 2022, 05:16:19 PM

Interesting, in the link they ended up with more or less what I did. :)

However, whilst I might be wrong, I was under the assumption that it reserved some bandwidth for new connections/sessions?

So maybe with more concurrent bandwidth demands from devices, you might find the connection overloaded (and bufferbloat coming back again) if you're close to the upper end of the available bandwidth?

With the maximum throughput of gigabit ethernet at around 940Mbps, I'd perhaps question the 1000Mbits set in the link - unless the cable modem has a 2.5Gb port on the inside.... which is what I'm waiting for :)

Title: Re: Throughput performance issues/shaper problems
Post by: gpb on January 20, 2022, 05:42:09 PM

QuoteWith the maximum throughput of gigabit ethernet at around 940Mbps, I'd perhaps question the 1000Mbits set in the link

Not sure, but his results at the bottom of that page show he's pulling only 751mb/s (and only 837 prior) with a 1000 setting (bottleneck somewhere?). Tweaking that setting here, I've tried to dial it in so I can maximize my bandwidth for a single host, while presumably allowing smaller requests to get through quickly. But you have a good point, I'll try a couple hosts running the BB test concurrently, curious to see those results.

Title: Re: Throughput performance issues/shaper problems
Post by: iMx on January 20, 2022, 05:59:15 PM

Might be worthwhile experimenting with the 1ms delay on the pipes - to prevent out of order packets - I found it helped. Unless you're one of these gamers that goes nuts over 1ms ;)

It seems the option to disable IO Fast still doesn't do anything, as it is commented out in line 946 in the code (same as earlier versions of FreeBSD)

https://github.com/opnsense/src/blob/stable/13/sys/netpfil/ipfw/ip_dn_io.c

As the chap in the pfsense post mentions originally:

"Since net.inet.ip.dummynet.io_fast does split path of packets for saturated/unsaturated pipe mode, then this setting is likely to be responsible for packet reordering. (traffic is very bursty for TCP without pacing or IPERF3 UDP test, so saturation/desaturation of pipe occurs several times in one second, so it seems then we get reorders on every transition)

But setting of net.inet.ip.dummynet.io_fast=0 has no effect, net.inet.ip.dummynet.io_pkt_fast is still increasing. Explanation is very simple:
io_fast check is commented in dummynet source code:

if (/*dn_cfg.io_fast &&*/ m *m0 && (dir & PROTO_LAYER2) 0 ) {"

Title: Re: Throughput performance issues/shaper problems
Post by: gpb on January 20, 2022, 08:21:08 PM

Thanks @iMx, added the 1ms and will do further testing later. Wonder if RSS solves the out-of-order issue or is the order problem not occurring via core assignment.

Title: Re: Throughput performance issues/shaper problems
Post by: iMx on January 20, 2022, 08:35:26 PM

My understanding is that with net.inet.ip.dummynet.io_fast enabled, as it is by default, that when the pipe bandwidth is not exhausted packets are forwarded directly without actually going through the shaper.

The problem arises when you're at capacity/saturation, some packets will be forwarded 'fast', some won't and that:

Quote'saturation/desaturation of pipe occurs several times in one second so it seems then we get reorders on every transition)'

Ideally, we could turn off io_fast to always force all packets through the shaper, but this option is commented in the code. So the only way to work around it, is to force all packets to always go through the shaper - adding a 1ms delay to all pipes makes that happen.

Title: Re: Throughput performance issues/shaper problems
Post by: gpb on January 20, 2022, 09:09:25 PM

That makes sense, thanks for the explanation.

Title: Re: Throughput performance issues/shaper problems
Post by: pswayze on May 23, 2022, 02:40:13 AM

I ended up getting the best performance by making sure my pipes were limited just below what my ISP is supposed to offer for speed performance in combination with the blow settings. My ISP is Verizon FIOS and I can get at best 940/880, so my limits are set to 900/850

Download + Upload pipe settings (not mentioned settings are blank)
Scheduler type: FlowQueue-CoDel

Yup, just the scheduler change did it for me on OPNsense 22.1.7_1-amd64

OPNsense Forum

Archive => 22.1 Legacy Series => Topic started by: bbin on November 12, 2021, 05:56:13 PM