Suggestions for troubleshooting slow nat performance (throughput)?

Started by jwest, July 14, 2023, 08:07:37 PM

Previous topic - Next topic
Retired, but managed pfsense during most of my career at an ISP. For home I was using an APU model with pfsense but when it died I needed something quick so got a ubiquiti edgerouter X that I've been using a year. I finally got around to replacing it and several friends said I should look at opnsense before installing pfsense. From spending a few days with it, I'm happier with opnsense. One problem though....

WAN connection is to ATT fiber via a BGW320-500 set to IP passthrough, speed 1gbps, static public IP.

When I first hooked up the ubiquiti edgerouter X to the above connection, my throughput via several online broadband speed tests showed around 250mbps, well short of 1gbps. The moment I turned on hardware nat offload, same speed tests showed around 950mbps and I've been getting that speed since.

I've built a box to replace the edgerouter X; i7-3770S (4 cores 8 threads at 3.1ghz), ASUS P8B75-M, 16gb ram, 240gb ssd, and two dual port Intel Pro1000 pci nics (4 ports total, 2 on each card) using em driver, and a built-in re0 that I may use for management or not at all. LAN is em0, WAN is em2, so each is on a different card.

When I hook this up in place of the edgerouter X, all speed tests from lan clients show about 250mbps. I tried turning on offloading for CRC, TSO, and LRO. Retesting speeds shows throughput is unchanged. I put the edgerouter X back in place of the new server and I'm back to about 950mbps, so it's something with the opnsense machine.

The setup is very simple - just nat from lan to wan, a handful of static dhcp mappings, and a couple port forwards. No other software is chewing up resources.

I saw this article https://jeffmbelt.com/opnsense-1g-throughput.html that may offer help, but I noticed several of the tunables dont exist so perhaps it is an old outdated article.

Can anyone point me in the right direction to begin troubleshooting why my throughput is tanking? I'd really love to stick with opnsense if possible.

Best,

J

What are you using to test?  The ISP site, speedtest.net, fast.com?

Can you put the new machine in between two computers and do an iperf3 test?

Can you do a vanilla install and test that without any changes?

I have tried several different speed test sites, the ones you list included. The results are all nominally comparative indicating it's unlikely to be related to one test site vs another. iperf3 is of course the definitive test, but I'm reasonably confident the above test is indicative of the same issue.

Testbed locally - I can do this when I get back in town next week, but the above test shows a marked difference between the two hosts, regardless of how its plumbed.

Well, not quite vanilla, as I'll have to assign some IP's :D The only thing I've added are static dhcp mappings and a few port forwards. I can reinstall to do this; just to cross it off the list.

When I get back, I'll also verify if the cards are getting assigned different interrupts or are they sharing one.

Best,

J
Soon as I have those results I'll post them here, thx.

Have you tried disabling all hardware offloading? Verify with ifconfig ...
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Hi Patrick - I think I may have seen you over on a NAS forum, if same person good to see you here as well :D

I believe by default all hardware offloading is already turned off, or are you referring to anything besides the 3 gui selections for TSO, LRO, and CRC? If there's a different spot in the gui or tunable that I should check, will do so.

Best,

J

Check with ifconfig if the UI actually did disable TXCSUM and RXCSUM. I had abysmal performance on DigitalOcean droplets with FreeBSD and pf NAT until I disabled that. If I remember correctly, because I ended up just disabling everything.
Deciso DEC750
People who think they know everything are a great annoyance to those of us who do. (Isaac Asimov)

Sorry it took me so long to get back to this, life intervened....

To try and get to the bottom of this, I put a fresh opnsense install on the i7/new router and ran an iperf test between it and a windows client on the same LAN (all ports are set to auto, all ports negotiated 1000MF). The windows client is 172.30.30.40, and the opnsense router is 172.30.30.1. Results:

C:\Users\Admin\Downloads\iperf-3.1.3-win64\iperf-3.1.3-win64>iperf3 -c 172.30.30.1 -p 34102
Connecting to host 172.30.30.1, port 34102
[  4] local 172.30.30.40 port 51088 connected to 172.30.30.1 port 34102
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec  64.2 MBytes   539 Mbits/sec
[  4]   1.00-2.00   sec  67.4 MBytes   565 Mbits/sec
[  4]   2.00-3.00   sec  59.8 MBytes   501 Mbits/sec
[  4]   3.00-4.00   sec  58.6 MBytes   491 Mbits/sec
[  4]   4.00-5.00   sec  67.5 MBytes   567 Mbits/sec
[  4]   5.00-6.00   sec  68.1 MBytes   571 Mbits/sec
[  4]   6.00-7.00   sec  68.1 MBytes   572 Mbits/sec
[  4]   7.00-8.00   sec  58.5 MBytes   491 Mbits/sec
[  4]   8.00-9.00   sec  68.0 MBytes   570 Mbits/sec
[  4]   9.00-10.00  sec  60.2 MBytes   506 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-10.00  sec   640 MBytes   537 Mbits/sec                  sender
[  4]   0.00-10.00  sec   640 MBytes   537 Mbits/sec                  receiver

iperf Done.


This would not seem to be correct throughput for a 1gb lan link. I also tried with and without hardware offloading per Patricks suggestion, UI changes are reflected in ifconfig output but bandwidth tests via 'broadband speed test site' still show about 250mbps. I'm not sure where to go with this other than replace hardware, but the nics are intel pro1000 (emX driver) so I'd think they are the most likely to work. These are plugged into standard PCI (not pcie, but regular pci) slots, maybe thats the ceiling I'm butting my head against?

Any advice is most appreciated!


PCI is very bandwidth constrained compared to PCIe - and with 4 gbe nics on PCI you're almost certainly running into those constraints

I'd replace those old pro1000 cards with either a couple of iintel i21x series cards (if you only need a couple of ports ) or if you really need 4 ports something like an i350-t4 - again well supported NICs for freeBSD.

you'll also probably want to look at enabling RSS once you've got the cards replaced - the i7-3770 has pretty low single-core perf by modern standards and without RSS you can end up limited by that 

found a block diagram for a 'typical' h77/z77 board - the 2 pci slots are off a pci/pcie bridge chip and the upstream link is pcie3x1 - that should be heaps for 4 x gbe links ( presuming the bridge chip isn't utter rubbish )

I'd check core loading on the firewall when running a speed test from a lan client to wan and see if you're saturating a single core ( ssh in to opnsense, run 'top -P' and watch the 'interrupt and 'idle' column ).

I dug up the pro/1000 mt dual specs pdf - no mention of rss on it at all that I can see, freedBSD used to have a separate 'emx' driver with rss support for some of those older cards but it doesn't list the 82546, just newer variants ( https://man.dragonflybsd.org/?command=emx&section=4 ). Suspect you need a new NIC

Could you please reverse the iperf3 connection with '-R' so the server sends, I'm curious about the speed.

iperf3 -R -c 172.30.30.1 -p 34102

I looked at photos of the mainboard and the PCI slots seem to be PCI 2.3 | 32 Bit | 33 MHz | 0.133 GByte/s | 5V while the NICs probably are PCI-X 1.0 | 64 Bit | 66 MHz | 0.533 GByte/s | 3.3V.

PCI-X is backward compatible, that's why it works, but there must be some overhang from the PCI-X cards at the end of the slot, could you please make a photo?

So the theoretical bandwidth (unidirectional) = 0.133 GByte/s * 8 = 1.064 Gbit/s shared by 2 ports = 0.532 GBit/s.

A PCI Express to PCI Extended adapter would be nice but this only seems to exist as PCIe to PCI 32 Bit 5 V adapter.

I seem to recall that all PCI slots shared the same bandwidth instead of having separate lanes like PCIe.  So this problem would remain regardless of whether using PCI or PCI-X NICs.

I'm running a 4th gen i5 using a PCIe quad NIC along with 10g and 2.5g NICs and I'm able to pull full gig speeds(higher when applicable), so I don't think the i7 is the bottleneck unless there's IDS/IPS, etc.

Quote from: johndchch on September 18, 2023, 11:10:13 AM
PCI is very bandwidth constrained compared to PCIe - and with 4 gbe nics on PCI you're almost certainly running into those constraints

I scrapped this from google, and then did the math to clean it up and include all cases. Supposedly:

PCI 32-bit, 33 MHz: 1067 Mbit/s or 133 MB/s
PCI 32-bit, 66 MHz: 2128 Mbit/s or 266 MB/s
PCI 64-bit, 33 MHz: 2128 Mbit/s or 266 MB/s
PCI 64-bit, 66 MHz: 4264 Mbit/s or 533 MB/s

What Gig-E (1000base-X) should be: 1000 Mbit/s or 125 MB/s
What I'm getting: 537 Mbits/s or 67 MB/s


So if 1gb is 125MB/s, worst case scenario in above (admittedly max best case), 133MB/s is what the slot can deliver. So I'd not think I'm running into that. Even less likely if it's a 66mhz or 64bit slot.

More below....

Quote from: johndchch on September 18, 2023, 11:25:48 AM
found a block diagram for a 'typical' h77/z77 board - the 2 pci slots are off a pci/pcie bridge chip and the upstream link is pcie3x1 - that should be heaps for 4 x gbe links ( presuming the bridge chip isn't utter rubbish )

I'd check core loading on the firewall when running a speed test from a lan client to wan and see if you're saturating a single core ( ssh in to opnsense, run 'top -P' and watch the 'interrupt and 'idle' column ).

I dug up the pro/1000 mt dual specs pdf - no mention of rss on it at all that I can see, freedBSD used to have a separate 'emx' driver with rss support for some of those older cards but it doesn't list the 82546, just newer variants ( https://man.dragonflybsd.org/?command=emx&section=4 ). Suspect you need a new NIC

My board is using B75 chipset. Slots are as follows:

1 PCI Express 3.0/2.0 x16 slot (pcie 3 speed is only supported by intel 3rd gen core processors)
1 PCI Express 2.0 x4 slot
2 PCI slots (dont know if these are 32 or 64 bit, and if they are 33 or 66 mhz)


Perhaps important detail:

Due to the 2U chassis I'm using, expansion cards can only be put into a 2 slot riser card horizontally. This riser card is bolted to the case in such a way that the only possible slot I can use is PCI2. Below is a picture - riser card is in pci2 and on the riser card are two pci slots (standard pci, not pcie). In each of those two slots is a 2 port intel 1000 nic (more on those below). In addition, there is a cable ending in a paddleboard from the riser card that plugs into PCI1 on the mainboard, only necessary if using two cards in the riser. I haven't rung out the pins, but I suspect that secondary cable/paddleboard is primarily for additional power. I would expect the two cards to share an interrupt, but perhaps the paddleboard takes care of that too.

The nic cards I am using are intel PRO/1000 MT, and are recognized by freebsd as:

em0: <Intel(R) Legacy PRO/1000 MT 82546EB (Copper)> port 0xc0c0-0xc0ff mem 0xf7c60000-0xf7c7ffff irq 19 at device 0.0 on pci7
em1: <Intel(R) Legacy PRO/1000 MT 82546EB (Copper)> port 0xc080-0xc0bf mem 0xf7c40000-0xf7c5ffff irq 16 at device 0.1 on pci7
em2: <Intel(R) Legacy PRO/1000 MT 82546EB (Copper)> port 0xc040-0xc07f mem 0xf7c20000-0xf7c3ffff irq 16 at device 1.0 on pci7
em3: <Intel(R) Legacy PRO/1000 MT 82546EB (Copper)> port 0xc000-0xc03f mem 0xf7c00000-0xf7c1ffff irq 17 at device 1.1 on pci7


I will do the additional iperf test you mention as well in a subsequent post. See attached picture below where the paddleboard is in PCIE1.



Thanks!

Quote from: vpx on September 19, 2023, 03:23:49 PM
Could you please reverse the iperf3 connection with '-R' so the server sends, I'm curious about the speed.

iperf3 -R -c 172.30.30.1 -p 34102

Per your request, I've added an iperf with the -R option:

C:\Users\Admin\Downloads\iperf-3.1.3-win64\iperf-3.1.3-win64>iperf3 -R -c 172.30.30.1 -p 23175
Connecting to host 172.30.30.1, port 23175
Reverse mode, remote host 172.30.30.1 is sending
[  4] local 172.30.30.40 port 56999 connected to 172.30.30.1 port 23175
[ ID] Interval           Transfer     Bandwidth
[  4]   0.00-1.00   sec  30.2 MBytes   254 Mbits/sec
[  4]   1.00-2.00   sec  30.1 MBytes   253 Mbits/sec
[  4]   2.00-3.00   sec  30.1 MBytes   253 Mbits/sec
[  4]   3.00-4.00   sec  30.1 MBytes   253 Mbits/sec
[  4]   4.00-5.00   sec  30.1 MBytes   252 Mbits/sec
[  4]   5.00-6.00   sec  30.2 MBytes   254 Mbits/sec
[  4]   6.00-7.00   sec  30.1 MBytes   253 Mbits/sec
[  4]   7.00-8.00   sec  30.1 MBytes   252 Mbits/sec
[  4]   8.00-9.00   sec  30.1 MBytes   252 Mbits/sec
[  4]   9.00-10.00  sec  30.9 MBytes   259 Mbits/sec
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec   302 MBytes   254 Mbits/sec    0             sender
[  4]   0.00-10.00  sec   302 MBytes   254 Mbits/sec                  receiver

iperf Done.